Applications of Computer Vision, Volume II

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 June 2024 | Viewed by 17682

Special Issue Editor


E-Mail Website
Guest Editor
Centro Singular de Investigación en Tecnoloxías Intelixentes (CITIUS, Research Center of Intelligent Systems), University of Santiago de Compostela, 15782 Santiago de Compostela, Spain
Interests: image segmentation; texture analysis; classification; regression; pattern recognition; applications of computer vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Computer vision (CV) techniques are widely used by practicing engineers to solve a range of real vision problems. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible, covering some practical applications of CV methods in all branches of science and engineering. Submitted papers should report some novel aspect of CV use for a real-world engineering application and should also be validated using data sets. There is no restriction to the length of the papers. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.

Focal points of the Special Issue include, but are not limited to, innovative applications of:

  • Medical and biological imaging
  • Industrial inspection
  • Robotics
  • Photo and video interpretation
  • Image retrieval
  • Video analysis and annotation
  • Multimedia
  • Sensors and more

Dr. Eva Cernadas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image and video segmentation
  • image classification
  • video analysis
  • pattern recognition
  • image and video understanding

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 5603 KiB  
Article
Multi-Scale Fusion Uncrewed Aerial Vehicle Detection Based on RT-DETR
by Minling Zhu and En Kong
Electronics 2024, 13(8), 1489; https://doi.org/10.3390/electronics13081489 - 14 Apr 2024
Viewed by 270
Abstract
With the rapid development of science and technology, uncrewed aerial vehicle (UAV) technology has shown a wide range of application prospects in various fields. The accuracy and real-time performance of UAV target detection play a vital role in ensuring safety and improving the [...] Read more.
With the rapid development of science and technology, uncrewed aerial vehicle (UAV) technology has shown a wide range of application prospects in various fields. The accuracy and real-time performance of UAV target detection play a vital role in ensuring safety and improving the work efficiency of UAVs. Aimed at the challenges faced by the current UAV detection field, this paper proposes the Gathering Cascaded Dilated DETR (GCD-DETR) model, which aims to improve the accuracy and efficiency of UAV target detection. The main innovations of this paper are as follows: (1) The Dilated Re-param Block is creatively applied to the dilatation-wise Residual module, which uses the large kernel convolution and the parallel small kernel convolution together and fuses the feature maps generated by multi-scale perception, greatly improving the feature extraction ability, thereby improving the accuracy of UAV detection. (2) The Gather-and-Distribute mechanism is introduced to effectively enhance the ability of multi-scale feature fusion so that the model can make full use of the feature information extracted from the backbone network and further improve the detection performance. (3) The Cascaded Group Attention mechanism is innovatively introduced, which not only saves the computational cost but also improves the diversity of attention by dividing the attention head in different ways, thus enhancing the ability of the model to process complex scenes. In order to verify the effectiveness of the proposed model, this paper conducts experiments on multiple UAV datasets of complex scenes. The experimental results show that the accuracy of the improved RT-DETR model proposed in this paper on the two UAV datasets reaches 0.956 and 0.978, respectively, which is 2% and 1.1% higher than that of the original RT-DETR model. At the same time, the FPS of the model is also improved by 10 frames per second, which achieves an effective balance between accuracy and speed. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

21 pages, 24678 KiB  
Article
Efficient Vision Transformer YOLOv5 for Accurate and Fast Traffic Sign Detection
by Guang Zeng, Zhizhou Wu, Lipeng Xu and Yunyi Liang
Electronics 2024, 13(5), 880; https://doi.org/10.3390/electronics13050880 - 25 Feb 2024
Viewed by 725
Abstract
Accurate and fast detection of traffic sign information is vital for autonomous driving systems. However, the YOLOv5 algorithm faces challenges with low accuracy and slow detection when it is used for traffic sign detection. To address these shortcomings, this paper introduces an accurate [...] Read more.
Accurate and fast detection of traffic sign information is vital for autonomous driving systems. However, the YOLOv5 algorithm faces challenges with low accuracy and slow detection when it is used for traffic sign detection. To address these shortcomings, this paper introduces an accurate and fast traffic sign detection algorithm–YOLOv5-Efficient Vision TransFormer(EfficientViT)). The algorithm focuses on improving both the accuracy and speed of the model by replacing the CSPDarknet backbone of the YOLOv5(s) model with the EfficientViT network. Additionally, the algorithm incorporates the Convolutional Block Attention Module(CBAM) attention mechanism to enhance feature layer information extraction and boost the accuracy of the detection algorithm. To mitigate the adverse effects of low-quality labels on gradient generation and enhance the competitiveness of high-quality anchor frames, a superior gradient gain allocation strategy is employed. Furthermore, the strategy introduces the Wise-IoU (WIoU), a dynamic non-monotonic focusing mechanism for bounding box loss, to further enhance the accuracy and speed of the object detection algorithm. The algorithm’s effectiveness is validated through experiments conducted on the 3L-TT100K traffic sign dataset, showcasing a mean average precision (mAP) of 94.1% in traffic sign detection. This mAP surpasses the performance of the YOLOv5(s) algorithm by 4.76% and outperforms the baseline algorithm. Additionally, the algorithm achieves a detection speed of 62.50 frames per second, which is much better than the baseline algorithm. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

17 pages, 3007 KiB  
Article
Facial Beauty Prediction Combined with Multi-Task Learning of Adaptive Sharing Policy and Attentional Feature Fusion
by Junying Gan, Heng Luo, Junling Xiong, Xiaoshan Xie, Huicong Li and Jianqiang Liu
Electronics 2024, 13(1), 179; https://doi.org/10.3390/electronics13010179 - 30 Dec 2023
Viewed by 621
Abstract
Facial beauty prediction (FBP) is a leading research subject in the field of artificial intelligence (AI), in which computers make facial beauty judgments and predictions similar to those of humans. At present, the methods are mainly based on deep neural networks. However, there [...] Read more.
Facial beauty prediction (FBP) is a leading research subject in the field of artificial intelligence (AI), in which computers make facial beauty judgments and predictions similar to those of humans. At present, the methods are mainly based on deep neural networks. However, there still exist some problems such as insufficient label information and overfitting. Multi-task learning uses label information from multiple databases, which increases the utilization of label information and enhances the feature extraction ability of the network. Attentional feature fusion (AFF) combines semantic information and introduces an attention mechanism to reduce the risk of overfitting. In this study, the multi-task learning of an adaptive sharing policy combined with AFF is presented based on the adaptive sharing (AdaShare) network in FBP. First, an adaptive sharing policy is added to multi-task learning with ResNet18 as the backbone network. Second, the AFF is introduced at the short skip connections of the network. The proposed method improves the accuracy of FBP by solving the problems of insufficient label information and overfitting issues. The experimental results based on the large-scale Asia facial beauty database (LSAFBD) and SCUT-FBP5500 databases show that the proposed method outperforms the single-database single-task baseline and can be applied extensively in image classification and other fields. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

17 pages, 4857 KiB  
Article
Two-Stage Progressive Learning for Vehicle Re-Identification in Variable Illumination Conditions
by Zhihe Wu, Zhi Jin and Xiying Li
Electronics 2023, 12(24), 4950; https://doi.org/10.3390/electronics12244950 - 09 Dec 2023
Viewed by 625
Abstract
Vehicle matching in variable illumination environments can be challenging due to the heavy dependence of vehicle appearance on lighting conditions. To address this issue, we propose a two-stage progressive learning (TSPL) framework. In the first stage, illumination-aware metric learning is enforced using a [...] Read more.
Vehicle matching in variable illumination environments can be challenging due to the heavy dependence of vehicle appearance on lighting conditions. To address this issue, we propose a two-stage progressive learning (TSPL) framework. In the first stage, illumination-aware metric learning is enforced using a two-branch network via two illumination-specific feature spaces, used to explicitly model differences in lighting. In the second stage, discriminative feature learning is introduced to extract distinguishing features from a given vehicle. This process consists of a local feature extraction attention module, a local constraint, and a balanced sampling strategy. During the metric learning phase, the model expresses the union of local features, extracted from the attention module, with illumination-specific global features to form joint vehicle features. As part of the study, we construct a large-scale dataset, termed VERI-DAN (vehicle re-identification across day and night), to address the current lack of vehicle datasets exhibiting variable lighting conditions. This set is composed of 200,004 images from 16,654 vehicles, collected in various natural illumination environments. Validation experiments conducted with the VERI-DAN and Vehicle-1M datasets demonstrated that our proposed methodology effectively improved vehicle re-identification Rank-1 accuracy. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

14 pages, 2403 KiB  
Article
DBENet: Dual-Branch Brightness Enhancement Fusion Network for Low-Light Image Enhancement
by Yongqiang Chen, Chenglin Wen, Weifeng Liu and Wei He
Electronics 2023, 12(18), 3907; https://doi.org/10.3390/electronics12183907 - 16 Sep 2023
Viewed by 836
Abstract
In this paper, we propose an end-to-end low-light image enhancement network based on the YCbCr color space to address the issues encountered by existing algorithms when dealing with brightness distortion and noise in the RGB color space. Traditional methods typically enhance the image [...] Read more.
In this paper, we propose an end-to-end low-light image enhancement network based on the YCbCr color space to address the issues encountered by existing algorithms when dealing with brightness distortion and noise in the RGB color space. Traditional methods typically enhance the image first and then denoise, but this amplifies the noise hidden in the dark regions, leading to suboptimal enhancement results. To overcome these problems, we utilize the characteristics of the YCbCr color space to convert the low-light image from RGB to YCbCr and design a dual-branch enhancement network. The network consists of a CNN branch and a U-net branch, which are used to enhance the contrast of luminance and chrominance information, respectively. Additionally, a fusion module is introduced for feature extraction and information measurement. It automatically estimates the importance of corresponding feature maps and employs adaptive information preservation to enhance contrast and eliminate noise. Finally, through testing on multiple publicly available low-light image datasets and comparing with classical algorithms, the experimental results demonstrate that the proposed method generates enhanced images with richer details, more realistic colors, and less noise. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

16 pages, 1044 KiB  
Article
RSLC-Deeplab: A Ground Object Classification Method for High-Resolution Remote Sensing Images
by Zhimin Yu, Fang Wan, Guangbo Lei, Ying Xiong, Li Xu, Zhiwei Ye, Wei Liu, Wen Zhou and Chengzhi Xu
Electronics 2023, 12(17), 3653; https://doi.org/10.3390/electronics12173653 - 30 Aug 2023
Cited by 1 | Viewed by 870
Abstract
With the continuous advancement of remote sensing technology, the semantic segmentation of different ground objects in remote sensing images has become an active research topic. For complex and diverse remote sensing imagery, deep learning methods have the ability to automatically discern features from [...] Read more.
With the continuous advancement of remote sensing technology, the semantic segmentation of different ground objects in remote sensing images has become an active research topic. For complex and diverse remote sensing imagery, deep learning methods have the ability to automatically discern features from image data and capture intricate spatial dependencies, thus outperforming traditional image segmentation methods. To address the problems of low segmentation accuracy in remote sensing image semantic segmentation, this paper proposes a new remote sensing image semantic segmentation network, RSLC-Deeplab, based on DeeplabV3+. Firstly, ResNet-50 is used as the backbone feature extraction network, which can extract deep semantic information more effectively and improve the segmentation accuracy. Secondly, the coordinate attention (CA) mechanism is introduced into the model to improve the feature representation generated by the network by embedding position information into the channel attention mechanism, effectively capturing the relationship between position information and channels. Finally, a multi-level feature fusion (MFF) module based on asymmetric convolution is proposed, which captures and refines low-level spatial features using asymmetric convolution and then fuses them with high-level abstract features to mitigate the influence of background noise and restore the lost detailed information in deep features. The experimental results on the WHDLD dataset show that the mean intersection over union (mIoU) of RSLC-Deeplab reached 72.63%, the pixel accuracy (PA) reached 83.49%, and the mean pixel accuracy (mPA) reached 83.72%. Compared to the original DeeplabV3+, the proposed method achieved a 4.13% improvement in mIoU and outperformed the PSP-NET, U-NET, MACU-NET, and DeeplabV3+ networks. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

18 pages, 8240 KiB  
Article
YOLO-CID: Improved YOLOv7 for X-ray Contraband Image Detection
by Ning Gan, Fang Wan, Guangbo Lei, Li Xu, Chengzhi Xu, Ying Xiong and Wen Zhou
Electronics 2023, 12(17), 3636; https://doi.org/10.3390/electronics12173636 - 28 Aug 2023
Viewed by 1301
Abstract
Currently, X-ray inspection systems may produce false detections due to factors such as the varying sizes of contraband images, complex backgrounds, and blurred edges. To address this issue, we propose the YOLO-CID method for contraband image detection. Firstly, we designed the MP-OD module [...] Read more.
Currently, X-ray inspection systems may produce false detections due to factors such as the varying sizes of contraband images, complex backgrounds, and blurred edges. To address this issue, we propose the YOLO-CID method for contraband image detection. Firstly, we designed the MP-OD module in the backbone network to enhance the model’s ability to extract key information from complex background images. Secondly, at the neck of the network, we designed a simplified version of BiFPN to add cross-scale connection lines in the feature fusion structure, to preserve deeper semantic information and enhance the network’s ability to represent objects in low-contrast or occlusion situations. Finally, we added a new object detection layer to improve the model’s accuracy in detecting small objects in dense environments. Experimental results on the PIDray public dataset show that the average accuracy rate of the YOLO-CID algorithm is 82.7% and the recall rate is 81.2%, which are 4.9% and 3.2% higher than the YOLOv7 algorithm, respectively. At the same time, the mAP on the CLCXray dataset reached 80.2%. Additionally, it can achieve a real-time detection speed of 40 frames per second and 43 frames per second in real scenes. These results demonstrate the effectiveness of the YOLO-CID algorithm in X-ray contraband detection. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

26 pages, 5695 KiB  
Article
Enhancing the Accuracy of an Image Classification Model Using Cross-Modality Transfer Learning
by Jiaqi Liu, Kwok Tai Chui and Lap-Kei Lee
Electronics 2023, 12(15), 3316; https://doi.org/10.3390/electronics12153316 - 02 Aug 2023
Cited by 1 | Viewed by 822
Abstract
Applying deep learning (DL) algorithms for image classification tasks becomes more challenging with insufficient training data. Transfer learning (TL) has been proposed to address these problems. In theory, TL requires only a small amount of knowledge to be transferred to the target task, [...] Read more.
Applying deep learning (DL) algorithms for image classification tasks becomes more challenging with insufficient training data. Transfer learning (TL) has been proposed to address these problems. In theory, TL requires only a small amount of knowledge to be transferred to the target task, but traditional transfer learning often requires the presence of the same or similar features in the source and target domains. Cross-modality transfer learning (CMTL) solves this problem by learning knowledge in a source domain completely different from the target domain, often using a source domain with a large amount of data, which helps the model learn more features. Most existing research on CMTL has focused on image-to-image transfer. In this paper, the CMTL problem is formulated from the text domain to the image domain. Our study started by training two separately pre-trained models in the text and image domains to obtain the network structure. The knowledge of the two pre-trained models was transferred via CMTL to obtain a new hybrid model (combining the BERT and BEiT models). Next, GridSearchCV and 5-fold cross-validation were used to identify the most suitable combination of hyperparameters (batch size and learning rate) and optimizers (SGDM and ADAM) for our model. To evaluate their impact, 48 two-tuple hyperparameters and two well-known optimizers were used. The performance evaluation metrics were validation accuracy, F1-score, precision, and recall. The ablation study confirms that the hybrid model enhanced accuracy by 12.8% compared with the original BEiT model. In addition, the results show that these two hyperparameters can significantly impact model performance. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

16 pages, 3248 KiB  
Article
Three-Dimensional Measurement of Full Profile of Steel Rail Cross-Section Based on Line-Structured Light
by Jiajia Liu, Jiapeng Zhang, Zhongli Ma, Hangtian Zhang and Shun Zhang
Electronics 2023, 12(14), 3194; https://doi.org/10.3390/electronics12143194 - 24 Jul 2023
Viewed by 1036
Abstract
The wear condition of steel rails directly affects the safety of railway operations. Line-structured-light visual measurement technology is used for online measurement of rail wear due to its ability to achieve high-precision dynamic measurements. However, in dynamic measurements, the random deviation of the [...] Read more.
The wear condition of steel rails directly affects the safety of railway operations. Line-structured-light visual measurement technology is used for online measurement of rail wear due to its ability to achieve high-precision dynamic measurements. However, in dynamic measurements, the random deviation of the measurement plane caused by the vibration of the railcar results in changes in the actual measured rail profile relative to its cross-sectional profile, ultimately leading to measurement deviations. To address these issues, this paper proposes a method for three-dimensional measurement of steel rail cross-sectional profiles based on binocular line-structured light. Firstly, calibrated dual cameras are used to simultaneously capture the profiles of both sides of the steel rail in the same world coordinate system, forming the complete rail profile. Then, considering that the wear at the rail waist is zero in actual operation, the coordinate of the circle center on both sides of the rail waist are connected to form feature vectors. The measured steel rail profile is aligned with the corresponding feature vectors of the standard steel rail model to achieve initial registration; next, the rail profile that has completed the preliminary matching is accurately matched with the target model based on the iterative closest point (ICP) algorithm. Finally, by comparing the projected complete rail profile onto the rail cross-sectional plane with the standard 3D rail model, the amount of wear on the railhead can be obtained. The experimental results indicate that the proposed line-structured-light measurement method for the complete rail profile, when compared to the measurements obtained from the rail wear gauge, exhibits smaller mean absolute deviation (MAD) and root mean square error (RMSE) for both the vertical and lateral dimensions. The MAD values for the vertical and lateral measurements are 0.009 mm and 0.039 mm, respectively, while the RMSE values are 0.011 mm and 0.048 mm. The MAD and RMSE values for the vertical and lateral wear measurements are lower than those obtained using the standard two-dimensional rail profile measurement method. Furthermore, it effectively eliminates the impact of vibrations during the dynamic measurement process, showcasing its practical engineering application value. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

17 pages, 4358 KiB  
Article
A Workpiece-Dense Scene Object Detection Method Based on Improved YOLOv5
by Jiajia Liu, Shun Zhang, Zhongli Ma, Yuehan Zeng and Xueyin Liu
Electronics 2023, 12(13), 2966; https://doi.org/10.3390/electronics12132966 - 05 Jul 2023
Cited by 1 | Viewed by 876
Abstract
Aiming at the problem of detection difficulties caused by the characteristics of high similarity and disorderly arrangement of workpieces in dense scenes of industrial production lines, this paper proposes a workpiece detection method based on improved YOLOv5, which embeds a coordinate attention mechanism [...] Read more.
Aiming at the problem of detection difficulties caused by the characteristics of high similarity and disorderly arrangement of workpieces in dense scenes of industrial production lines, this paper proposes a workpiece detection method based on improved YOLOv5, which embeds a coordinate attention mechanism in the feature extraction network to enhance the network’s focus on important features and enhance the model’s ability to pinpoint targets. The pooling structure of the space pyramid has been replaced, which reduces the amount of calculation and further improves the running speed. A weighted bidirectional feature pyramid is introduced in the feature fusion network to realize efficient bidirectional cross-scale connection and weighted feature fusion, and improve the detection ability of small targets and dense targets. The SIoU loss function is used to improve the training speed and further improve the detection performance of the model. The average accuracy of the improved model on the self-built artifact dataset is improved by 5% compared with the original model and the number of model parameters is 14.6MB, which is only 0.5MB higher than the original model. It is proved that the improved model has the characteristics of high detection accuracy, strong robustness and light weight. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

18 pages, 2510 KiB  
Article
Improving the Performance of the Single Shot Multibox Detector for Steel Surface Defects with Context Fusion and Feature Refinement
by Yiming Li, Lixin He, Min Zhang, Zhi Cheng, Wangwei Liu and Zijun Wu
Electronics 2023, 12(11), 2440; https://doi.org/10.3390/electronics12112440 - 27 May 2023
Viewed by 1029
Abstract
Strip surface defects have large intraclass and small interclass differences, resulting in the available detection techniques having either a low accuracy or very poor real-time performance. In order to improve the ability for capturing steel surface defects, the context fusion structure introduces the [...] Read more.
Strip surface defects have large intraclass and small interclass differences, resulting in the available detection techniques having either a low accuracy or very poor real-time performance. In order to improve the ability for capturing steel surface defects, the context fusion structure introduces the local information of the shallow layer and the semantic information of the deep layer into multiscale feature maps. In addition, for filtering the semantic conflicts and redundancies arising from context fusion, a feature refinement module is introduced in our method, which further improves the detection accuracy. Our experimental results show that this significantly improved the performance. In particular, our method achieved 79.5% mAP and 71 FPS on the public NEU-DET dataset. This means that our method had a higher detection accuracy compared to other techniques. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

14 pages, 3644 KiB  
Article
Object Detection Algorithm of UAV Aerial Photography Image Based on Anchor-Free Algorithms
by Qi Hu, Lin Li, Jin Duan, Meiling Gao, Gaotian Liu, Zhiyuan Wang and Dandan Huang
Electronics 2023, 12(6), 1339; https://doi.org/10.3390/electronics12061339 - 11 Mar 2023
Cited by 3 | Viewed by 1822
Abstract
Aiming at the problems of the difficult extraction of small target feature information, complex background, and variable target scale in unmanned aerial vehicle (UAV) aerial photography images. In this paper, an anchor-free target detection algorithm based on fully convolutional one-stage object detection (FCOS) [...] Read more.
Aiming at the problems of the difficult extraction of small target feature information, complex background, and variable target scale in unmanned aerial vehicle (UAV) aerial photography images. In this paper, an anchor-free target detection algorithm based on fully convolutional one-stage object detection (FCOS) for UAV aerial photography images is proposed. For the problem of complex backgrounds, the global context module is introduced in the ResNet50 network, which is combined with feature pyramid networks (FPN) as the backbone feature extraction network to enhance the feature representation of targets in complex backgrounds. To address the problem of the difficult detection of small targets, an adaptive feature balancing sub-network is designed to filter the invalid information generated at all levels of feature fusion, strengthen multi-layer features, and improve the recognition capability of the model for small targets. To address the problem of variable target scales, complete intersection over union (CIOU) Loss is used to optimize the regression loss and strengthen the model’s ability to locate multi-scale targets. The algorithm of this paper is compared quantitatively and qualitatively on the VisDrone dataset. The experiments show that the proposed algorithm improves 4.96% on average precision (AP) compared with the baseline algorithm FCOS, and the detection speed is 35 frames per second (FPS), confirming that the algorithm has satisfactory detection performance, real-time inference speed, and has effectively improved the problem of missed detection and false detection of targets in UAV aerial images. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

14 pages, 10285 KiB  
Article
A Vehicle Recognition Model Based on Improved YOLOv5
by Lei Shao, Han Wu, Chao Li and Ji Li
Electronics 2023, 12(6), 1323; https://doi.org/10.3390/electronics12061323 - 10 Mar 2023
Cited by 3 | Viewed by 2935
Abstract
The rapid development of the automobile industry has made life easier for people, but traffic accidents have increased in frequency in recent years, making vehicle safety particularly important. This paper proposes an improved YOLOv5s algorithm for vehicle identification and detection to reduce vehicle [...] Read more.
The rapid development of the automobile industry has made life easier for people, but traffic accidents have increased in frequency in recent years, making vehicle safety particularly important. This paper proposes an improved YOLOv5s algorithm for vehicle identification and detection to reduce vehicle driving safety issues based on this problem. In order to solve the problems of a disappearing model training gradient in the YOLOv5s algorithm, difficulty in recognizing small objects and poor recognition accuracy caused by the boundary frame regression function, it is necessary to implement a new function. These aspects have been enhanced in this article. On the basis of the traditional YOLOv5s algorithm, the ELU activation function is used to replace the original activation function. The attention mechanism module is then added to the YOLOv5s algorithm’s backbone network to improve the feature extraction of small and medium-sized objects. The CIoU Loss function replaces the original regression function of YOLOv5s, thereby enhancing the convergence rate and measurement precision of the loss function. In this paper, the constructed dataset is utilized to conduct pertinent experiments. The experimental results demonstrate that, compared to the previous algorithm, the mAP of the enhanced YOLOv5s is 3.1% higher, the convergence rate is 0.8% higher, and the loss is 2.5% lower. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

17 pages, 5533 KiB  
Article
Material-Aware Path Aggregation Network and Shape Decoupled SIoU for X-ray Contraband Detection
by Nan Xiang, Zehao Gong, Yi Xu and Lili Xiong
Electronics 2023, 12(5), 1179; https://doi.org/10.3390/electronics12051179 - 28 Feb 2023
Cited by 3 | Viewed by 1432
Abstract
X-ray contraband detection plays an important role in the field of public safety. To solve the multi-scale and obscuration problem in X-ray contraband detection, we propose a material-aware path aggregation network to detect and classify contraband in X-ray baggage images. Based on YoloX, [...] Read more.
X-ray contraband detection plays an important role in the field of public safety. To solve the multi-scale and obscuration problem in X-ray contraband detection, we propose a material-aware path aggregation network to detect and classify contraband in X-ray baggage images. Based on YoloX, our network integrates two new modules: multi-scale smoothed atrous convolution (SCA) and material-aware coordinate attention modules (MCA). In SAC, an improved receptive field-enhanced network structure is proposed by combining smoothed atrous convolution, using separate shared convolution, with a parallel branching structure, which allows for the acquisition of multi-scale receptive fields while reducing grid effects. In the MCA, we incorporate a spatial coordinate separation material perception module with a coordinated attention mechanism. A material perception module can extract the material information features in X and Y dimensions, respectively, which alleviates the obscuring problem by focusing on the distinctive material characteristics. Finally, we design the shape-decoupled SIoU loss function (SD-SIoU) for the shape characteristics of the X-ray contraband. The category decoupling module and the long–short side decoupling module are integrated to the shape loss. It can effectively balance the effect of the long–short side. We evaluate our approach on the public X-ray contraband SIXray and OPIXray datasets, and the results show that our approach is competitive with other X-ray baggage inspection approaches. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

15 pages, 9472 KiB  
Article
Fast Adaptive Binarization of QR Code Images for Automatic Sorting in Logistics Systems
by Rongjun Chen, Weijie Li, Kailin Lan, Jinghui Xiao, Leijun Wang and Xu Lu
Electronics 2023, 12(2), 286; https://doi.org/10.3390/electronics12020286 - 05 Jan 2023
Cited by 1 | Viewed by 1710
Abstract
With the development of technology, QR codes play an important role in information exchange. In order to work out the problem of uneven illumination in automatic sorting in logistics systems, an adaptive method in binarization is presented. The proposed method defines the block [...] Read more.
With the development of technology, QR codes play an important role in information exchange. In order to work out the problem of uneven illumination in automatic sorting in logistics systems, an adaptive method in binarization is presented. The proposed method defines the block windows’ size adaptively for local binarization based on the traits of the QR code. It takes advantage of integral images to calculate the sum of gray values in a block. The method can binarize the QR code with high quality and speed under uneven illumination. Compared with several existing algorithms, it is shown that the proposed method is more effective. The experimental results validate that the proposed method has a higher recognition accuracy and is more efficient in binarization. Full article
(This article belongs to the Special Issue Applications of Computer Vision, Volume II)
Show Figures

Figure 1

Back to TopTop