Deep Learning and Computer Vision for Object Recognition

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 October 2023) | Viewed by 11171

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematical, Physical and Computer Sciences, University of Parma, 43124 Parma, Italy
Interests: computer science; feature extraction; deep learning; meta-learning; computer vision

E-Mail Website
Guest Editor
Department of Mathematical, Physical and Computer Sciences, University of Parma, 43124 Parma, Italy
Interests: computer science; bioinformatics; computational biology; parallel computing; graph theory; data integration

E-Mail Website
Guest Editor
Department of Mathematical, Physical and Computer Sciences, University of Parma, 43124 Parma, Italy
Interests: big data; data analysis; health data analysis; data mining; information retrieval; machine learning; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the last decade, we have witnessed the increasing significance of deep learning techniques and deep neural network architectures in artificial intelligence (AI) research, especially in the field of computer vision. These methods have contributed to important advances in image processing and pattern recognition (e.g., object detection), becoming a de facto standard in approaching such tasks. Deep learning for computer vision is still a very fast-growing scientific branch, as shown by recent work on transformers and ConvNet models. Despite this, the application of these cutting-edge technologies usually requires a large amount of well-balanced data, which often needs to be labeled, as well as a great deal of computational resources. The task of object recognition, that is, the identification of specific objects within an image or frame sequence, aims to localize and classify items that are of interest in a wide range of applications, from mobile games to industrial vision systems to be integrated into a production line. At present, deep learning techniques are often involved in the task of object recognition. However, such a wide variety of application domains requires innovative approaches able to work under a few-data condition, which is in contrast with classical deep learning requirements.

This Special Issue aims to explore recent advances and trends in the use of deep learning and computer vision methods for object recognition, and seeks original contributions that point out possible ways to deal with scarce and heterogeneous input data, as well as the variability of input domains. This includes but is not limited to meta-learning techniques, one-shot or few-shot learning, data augmentation, and fast or real-time object detection.

Dr. Eleonora Iotti
Dr. Vincenzo Bonnici
Dr. Flavio Bertini
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • object detection
  • meta-learning
  • computer vision
  • artificial intelligence

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 6038 KiB  
Article
A Lightweight Forest Pest Image Recognition Model Based on Improved YOLOv8
by Tingyao Jiang and Shuo Chen
Appl. Sci. 2024, 14(5), 1941; https://doi.org/10.3390/app14051941 - 27 Feb 2024
Viewed by 730
Abstract
In response to the shortcomings of traditional pest detection methods, such as inadequate accuracy and slow detection speeds, a lightweight forestry pest image recognition model based on an improved YOLOv8 architecture is proposed. Initially, given the limited availability of real deep forest pest [...] Read more.
In response to the shortcomings of traditional pest detection methods, such as inadequate accuracy and slow detection speeds, a lightweight forestry pest image recognition model based on an improved YOLOv8 architecture is proposed. Initially, given the limited availability of real deep forest pest image data in the wild, data augmentation techniques, including random rotation, translation, and Mosaic, are employed to expand and enhance the dataset. Subsequently, the traditional Conv (convolution) layers in the neck module of YOLOv8 are replaced with lightweight GSConv, and the Slim Neck design paradigm is utilized for reconstruction to reduce computational costs while preserving model accuracy. Furthermore, the CBAM attention mechanism is introduced into the backbone network of YOLOv8 to enhance the feature extraction of crucial information, thereby improving detection accuracy. Finally, WIoU is employed as a replacement for the traditional CIOU to enhance the overall performance of the detector. The experimental results demonstrate that the improved model exhibits a significant advantage in the field of forestry pest detection, achieving precision and recall rates of 98.9% and 97.6%, respectively. This surpasses the performance of the current mainstream network models. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

16 pages, 2661 KiB  
Article
Collaborative Encoding Method for Scene Text Recognition in Low Linguistic Resources: The Uyghur Language Case Study
by Miaomiao Xu, Jiang Zhang, Lianghui Xu, Wushour Silamu and Yanbing Li
Appl. Sci. 2024, 14(5), 1707; https://doi.org/10.3390/app14051707 - 20 Feb 2024
Viewed by 491
Abstract
Current research on scene text recognition primarily focuses on languages with abundant linguistic resources, such as English and Chinese. In contrast, there is relatively limited research dedicated to low-resource languages. Advanced methods for scene text recognition often employ Transformer-based architectures. However, the performance [...] Read more.
Current research on scene text recognition primarily focuses on languages with abundant linguistic resources, such as English and Chinese. In contrast, there is relatively limited research dedicated to low-resource languages. Advanced methods for scene text recognition often employ Transformer-based architectures. However, the performance of Transformer architectures is suboptimal when dealing with low-resource datasets. This paper proposes a Collaborative Encoding Method for Scene Text Recognition in the low-resource Uyghur language. The encoding framework comprises three main modules: the Filter module, the Dual-Branch Feature Extraction module, and the Dynamic Fusion module. The Filter module, consisting of a series of upsampling and downsampling operations, performs coarse-grained filtering on input images to reduce the impact of scene noise on the model, thereby obtaining more accurate feature information. The Dual-Branch Feature Extraction module adopts a parallel structure combining Transformer encoding and Convolutional Neural Network (CNN) encoding to capture local and global information. The Dynamic Fusion module employs an attention mechanism to dynamically merge the feature information obtained from the Transformer and CNN branches. To address the scarcity of real data for natural scene Uyghur text recognition, this paper conducted two rounds of data augmentation on a dataset of 7267 real images, resulting in 254,345 and 3,052,140 scene images, respectively. This process partially mitigated the issue of insufficient Uyghur language data, making low-resource scene text recognition research feasible. Experimental results demonstrate that the proposed collaborative encoding approach achieves outstanding performance. Compared to baseline methods, our collaborative encoding approach improves accuracy by 14.1%. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

16 pages, 34705 KiB  
Article
DFP-Net: A Crack Segmentation Method Based on a Feature Pyramid Network
by Linjing Li, Ran Liu, Rashid Ali, Bo Chen, Haitao Lin, Yonglong Li and Hua Zhang
Appl. Sci. 2024, 14(2), 651; https://doi.org/10.3390/app14020651 - 12 Jan 2024
Viewed by 555
Abstract
Timely detection of defects is essential for ensuring safe and stable operation of concrete buildings. Automatic segmentation of concrete buildings’ surfaces is challenging due to the high diversity of crack appearance, the detailed information, and the unbalanced proportion of crack pixels and background [...] Read more.
Timely detection of defects is essential for ensuring safe and stable operation of concrete buildings. Automatic segmentation of concrete buildings’ surfaces is challenging due to the high diversity of crack appearance, the detailed information, and the unbalanced proportion of crack pixels and background pixels. In this work, the Double Feature Pyramid Network is designed for high-precision crack segmentation. Our work reached the state-of-the-art level in crack segmentation, with key contributions outlined as follows: firstly, considering the diversity of crack shapes, the network constructs a feature pyramid containing three feature extraction backbones to extract the global feature map with three scale input images. In particular, due to the biggest challenge being too much single-pixel crack area, the targeted feature pyramid based on the high-resolution is added to extract adequate shallow semantic information. Lastly, designing a cascade feature fusion unit to aggregate the extracted multi-dimensional feature maps and obtain the final prediction. Compared with existing crack detection methods, the superior performance of this method has been verified based on extensive experiments, with Pixel Accuracy of 65.99%, Intersection over Union of 44.71%, and Recall of 62.95%, providing a reliable and efficient solution for the health monitoring and maintenance of concrete structures. This work contributes to the advancement of research and practical applications in related fields, offering robust support for the monitoring and maintenance of concrete structures. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

12 pages, 1946 KiB  
Article
A Large-Class Few-Shot Learning Method Based on High-Dimensional Features
by Jiawei Dang, Yu Zhou, Ruirui Zheng and Jianjun He
Appl. Sci. 2023, 13(23), 12843; https://doi.org/10.3390/app132312843 - 30 Nov 2023
Viewed by 697
Abstract
Large-class few-shot learning has a wide range of applications in many fields, such as the medical, power, security, and remote sensing fields. At present, many few-shot learning methods for fewer-class scenarios have been proposed, but little research has been performed for large-class scenarios. [...] Read more.
Large-class few-shot learning has a wide range of applications in many fields, such as the medical, power, security, and remote sensing fields. At present, many few-shot learning methods for fewer-class scenarios have been proposed, but little research has been performed for large-class scenarios. In this paper, we propose a large-class few-shot learning method called HF-FSL, which is based on high-dimensional features. Recent theoretical research shows that if the distribution of samples in a high-dimensional feature space meets the conditions of compactness within the class and the dispersion between classes, the large-class few-shot learning method has a better generalization ability. Inspired by this theory, the basic idea is use a deep neural network to extract high-dimensional features and unitize them to project the samples onto a hypersphere. The global orthogonal regularization strategy can then be used to make samples of different classes on the hypersphere that are as orthogonal as possible, so as to achieve the goal of sample compactness within the class and the dispersion between classes in high-dimensional feature space. Experiments on Omniglot, Fungi, and ImageNet demonstrate that the proposed method can effectively improve the recognition accuracy in a large-class FSL problem. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

22 pages, 9343 KiB  
Article
Foreign-Object Detection in High-Voltage Transmission Line Based on Improved YOLOv8m
by Zhenyue Wang, Guowu Yuan, Hao Zhou, Yi Ma and Yutang Ma
Appl. Sci. 2023, 13(23), 12775; https://doi.org/10.3390/app132312775 - 28 Nov 2023
Cited by 3 | Viewed by 911
Abstract
The safe operation of high-voltage transmission lines ensures the power grid’s security. Various foreign objects attached to the transmission lines, such as balloons, kites and nesting birds, can significantly affect the safe and stable operation of high-voltage transmission lines. With the advancement of [...] Read more.
The safe operation of high-voltage transmission lines ensures the power grid’s security. Various foreign objects attached to the transmission lines, such as balloons, kites and nesting birds, can significantly affect the safe and stable operation of high-voltage transmission lines. With the advancement of computer vision technology, periodic automatic inspection of foreign objects is efficient and necessary. Existing detection methods have low accuracy because foreign objects attached to the transmission lines are complex, including occlusions, diverse object types, significant scale variations, and complex backgrounds. In response to the practical needs of the Yunnan Branch of China Southern Power Grid Co., Ltd., this paper proposes an improved YOLOv8m-based model for detecting foreign objects on transmission lines. Experiments are conducted on a dataset collected from Yunnan Power Grid. The proposed model enhances the original YOLOv8m by incorporating a Global Attention Module (GAM) into the backbone to focus on occluded foreign objects, replacing the SPPF module with the SPPCSPC module to augment the model’s multiscale feature extraction capability, and introducing the Focal-EIoU loss function to address the issue of high- and low-quality sample imbalances. These improvements accelerate model convergence and enhance detection accuracy. The experimental results demonstrate that our proposed model achieves a 2.7% increase in mAP_0.5, a 4% increase in mAP_0.5:0.95, and a 6% increase in recall. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

17 pages, 1825 KiB  
Article
Three-Dimensional Human Pose Estimation with Spatial–Temporal Interaction Enhancement Transformer
by Haijian Wang, Qingxuan Shi and Beiguang Shan
Appl. Sci. 2023, 13(8), 5093; https://doi.org/10.3390/app13085093 - 19 Apr 2023
Cited by 1 | Viewed by 1623
Abstract
Three-dimensional human pose estimation is a hot research topic in the field of computer vision. In recent years, significant progress has been made in estimating 3D human pose from monocular video, but there is still much room for improvement in this task owing [...] Read more.
Three-dimensional human pose estimation is a hot research topic in the field of computer vision. In recent years, significant progress has been made in estimating 3D human pose from monocular video, but there is still much room for improvement in this task owing to the issues of self-occlusion and depth ambiguity. Some previous work has addressed the above problems by investigating spatio-temporal relationships and has made great progress. Based on this, we further explored the spatio-temporal relationship and propose a new method, called STFormer. Our whole framework consists of two main stages: (1) extract features independently from the temporal and spatial domains; (2) modeling the communication of information across domains. The temporal dependencies were injected into the spatial domain to dynamically modify the spatial structure relationships between joints. Then, the results were used to refine the temporal features. After the preceding steps, both spatial and temporal features were strengthened, and the estimated final pose will be more precise. We conducted substantial experiments on a well-known dataset (Human3.6), and the results indicated that STFormer outperformed recent methods with an input of nine frames. Compared to PoseFormer, the performance of our method reduced the MPJPE by 2.1%. Furthermore, we performed numerous ablation studies to analyze and prove the validity of the various constituent modules of STFormer. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

17 pages, 5396 KiB  
Article
Research on Vehicle Re-Identification Algorithm Based on Fusion Attention Method
by Peng Chen, Shuang Liu and Simon Kolmanič
Appl. Sci. 2023, 13(7), 4107; https://doi.org/10.3390/app13074107 - 23 Mar 2023
Viewed by 1058
Abstract
The specific task of vehicle re-identification is how to quickly and correctly match the same vehicle in different scenarios. In order to solve the problem of inter-class similarity and environmental interference in vehicle images in complex scenes, one fusion attention method is put [...] Read more.
The specific task of vehicle re-identification is how to quickly and correctly match the same vehicle in different scenarios. In order to solve the problem of inter-class similarity and environmental interference in vehicle images in complex scenes, one fusion attention method is put forward based on the idea of obtaining the distinguishing features of details—the mechanism for the vehicle re-identification method. First, the vehicle image is preprocessed to restore the image’s attributes better. Then, the processed image is sent to ResNet50 to extract the features of the second and third layers, respectively. Then, the feature fusion is carried out through the two-layer attention mechanism for a network model. This model can better focus on local detail features, and global features are constructed and named SDLAU-Reid. In the training process, a data augmentation strategy of random erasure is adopted to improve the robustness. The experimental results show that the mAP and rank-k indicators of the model on VeRi-776 and the VehicleID are better than the results of the existing vehicle re-identification algorithms, which verifies the algorithm’s effectiveness. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

15 pages, 3855 KiB  
Article
LC-YOLO: A Lightweight Model with Efficient Utilization of Limited Detail Features for Small Object Detection
by Menghua Cui, Guoliang Gong, Gang Chen, Hongchang Wang, Min Jin, Wenyu Mao and Huaxiang Lu
Appl. Sci. 2023, 13(5), 3174; https://doi.org/10.3390/app13053174 - 01 Mar 2023
Cited by 6 | Viewed by 2121
Abstract
The limited computing resources on edge devices such as Unmanned Aerial Vehicles (UAVs) mean that lightweight object detection algorithms based on convolution neural networks require significant development. However, lightweight models are challenged by small targets with few available features. In this paper, we [...] Read more.
The limited computing resources on edge devices such as Unmanned Aerial Vehicles (UAVs) mean that lightweight object detection algorithms based on convolution neural networks require significant development. However, lightweight models are challenged by small targets with few available features. In this paper, we propose an LC-YOLO model that uses detailed information about small targets in each layer to improve detection performance. The model is improved from the one-stage detector, and contains two optimization modules: Laplace Bottleneck (LB) and Cross-Layer Attention Upsampling (CLAU). The LB module is proposed to enhance shallow features by integrating prior information into the convolutional neural network and maximizing knowledge sharing within the network. CLAU is designed for the pixel-level fusion of deep features and shallow features. Under the combined action of these two modules, the LC-YOLO model achieves better detection performance on the small object detection task. The LC-YOLO model with a parameter quantity of 7.30M achieves an mAP of 94.96% on the remote sensing dataset UCAS-AOD, surpassing the YOLOv5l model with a parameter quantity of 46.61M. The tiny version of LC-YOLO with 1.83M parameters achieves 94.17% mAP, which is close to YOLOv5l. Therefore, the LC-YOLO model can replace many heavyweight networks to complete the small target high-precision detection task under limited computing resources, as in the case of mobile edge-end chips such as UAV onboard chips. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

20 pages, 26421 KiB  
Article
Deep Learning-Based Algorithm for Recognizing Tennis Balls
by Di Wu and Aiping Xiao
Appl. Sci. 2022, 12(23), 12116; https://doi.org/10.3390/app122312116 - 26 Nov 2022
Cited by 2 | Viewed by 1881
Abstract
In this paper, we adjust the hyperparameters of the training model based on the gradient estimation theory and optimize the structure of the model based on the loss function theory of Mask R-CNN convolutional network and propose a scheme to help a tennis [...] Read more.
In this paper, we adjust the hyperparameters of the training model based on the gradient estimation theory and optimize the structure of the model based on the loss function theory of Mask R-CNN convolutional network and propose a scheme to help a tennis picking robot to perform target recognition and improve the ability of the tennis picking robot to acquire and analyze image information. By collecting suitable image samples of tennis balls and training the image samples using Mask R-CNN convolutional network an algorithmic model dedicated to recognizing tennis balls is output; the final data of various loss functions after gradient descent are recorded, the iterative graph of the model is drawn, and the iterative process of the neural network at different iteration levels is observed; finally, this improved and optimized algorithm for recognizing tennis balls is compared with other algorithms for recognizing tennis balls and a comparison is made. The experimental results show that the improved algorithm based on Mask R-CNN recognizes tennis balls with 92% accuracy between iteration levels 30 and 35, which has higher accuracy and recognition distance compared with other tennis ball recognition algorithms, confirming the feasibility and applicability of the optimized algorithm in this paper. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision for Object Recognition)
Show Figures

Figure 1

Back to TopTop