You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.

Search for Topics:

Title/Keyword

Journal

Submission Status

Category

Submit your Manuscript Submit your Abstract Propose a Topic

Topic Menu

Topic Editors

Prof. Dr. Antonio Fernández-Caballero

E-Mail Website

Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, 02071 Albacete, Spain

Prof. Dr. Byung-Gyu Kim

E-Mail Website

Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea

Applied Computer Vision and Pattern Recognition: 2nd Volume

Abstract submission deadline

30 October 2024

Manuscript submission deadline

30 December 2024

Viewed by

49837

Topic Information

Dear Colleagues,

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Computer vision tasks include methods for acquiring digital images (through image sensors), image processing, and image analysis to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that a computer can interpret. For interpretation, computer vision is closely related to pattern recognition.

Indeed, pattern recognition is the process of recognizing patterns by using machine learning algorithms. Pattern recognition can be defined as the identification and classification of meaningful patterns of data based on the extraction and comparison of characteristic properties or features of the data. Pattern recognition is a very important area of research and application, underpinning developments in related fields, such as computer vision, image processing, text and document analysis, and neural networks. It is closely related to machine learning and finds applications in rapidly emerging areas, such as biometrics, bioinformatics, multimedia data analysis, and, more recently, data science. Nowdays, a data-driven approach (such as deep learning) is popular to achieve the goal of pattern recognition and classification in many applications.

This Topic, on Applied Computer Vision and Pattern Recognition, invites papers on theoretical and applied issues, including, but not limited to, the following areas:

Statistical, structural, and syntactic pattern recognition;
Neural networks, machine learning, and deep learning;
Computer vision, robot vision, and machine vision;
Multimedia systems and multimedia content;
Biosignal processing, speech processing, image processing, and video processing;
Data mining, information retrieval, big data, and business intelligence.

This Topic will present the results of research describing recent advances in both the computer vision and pattern recognition fields.

Prof. Dr. Antonio Fernández-Caballero
Prof. Dr. Byung-Gyu Kim
Topic Editors

Keywords

pattern recognition
neural networks, machine learning
deep learning, artificial intelligence
computer vision
multimedia
data mining
signal processing
image processing

Participating Journals

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400	Submit
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400	Submit
Machine Learning and Knowledge Extraction make	3.9	8.5	2019	19.9 Days	CHF 1800	Submit
Journal of Imaging jimaging	3.2	4.4	2015	21.7 Days	CHF 1800	Submit
Sensors sensors	3.9	6.8	2001	17 Days	CHF 2600	Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

Immediately share your ideas ahead of publication and establish your research priority;
Protect your idea from being stolen with this time-stamped preprint article;
Enhance the exposure and impact of your research;
Receive feedback from your peers in advance;
Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (34 papers)

Download All Papers

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

21 pages, 1948 KiB

Open AccessArticle

Tensorized Discrete Multi-View Spectral Clustering

by Qin Li, Geng Yang, Yu Yun, Yu Lei and Jane You

Electronics 2024, 13(3), 491; https://doi.org/10.3390/electronics13030491 - 24 Jan 2024

Viewed by 575

Discrete spectral clustering directly obtains the discrete labels of data, but existing clustering methods assume that the real-valued indicator matrices of different views are identical, which is unreasonable in practical applications. Moreover, they do not effectively exploit the spatial structure and complementary information [...] Read more.

Discrete spectral clustering directly obtains the discrete labels of data, but existing clustering methods assume that the real-valued indicator matrices of different views are identical, which is unreasonable in practical applications. Moreover, they do not effectively exploit the spatial structure and complementary information embedded in views. To overcome this disadvantage, we propose a tensorized discrete multi-view spectral clustering model that integrates spectral embedding and spectral rotation into a unified framework. Specifically, we leverage the weighted tensor nuclear-norm regularizer on the third-order tensor, which consists of the real-valued indicator matrices of views, to exploit the complementary information embedded in the indicator matrices of different views. Furthermore, we present an adaptively weighted scheme that takes into account the relationship between views for clustering. Finally, discrete labels are obtained by spectral rotation. Experiments show the effectiveness of our proposed method. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

20 pages, 15144 KiB

Open AccessArticle

HRYNet: A Highly Robust YOLO Network for Complex Road Traffic Object Detection

by Lindong Tang, Lijun Yun, Zaiqing Chen and Feiyan Cheng

Sensors 2024, 24(2), 642; https://doi.org/10.3390/s24020642 - 19 Jan 2024

Cited by 1 | Viewed by 1284

Object detection is a crucial component of the perception system in autonomous driving. However, the road scene presents a highly intricate environment where the visibility and characteristics of traffic targets are susceptible to attenuation and loss due to various complex road scenarios such [...] Read more.

Object detection is a crucial component of the perception system in autonomous driving. However, the road scene presents a highly intricate environment where the visibility and characteristics of traffic targets are susceptible to attenuation and loss due to various complex road scenarios such as lighting conditions, weather conditions, time of day, background elements, and traffic density. Nevertheless, the current object detection network must exhibit more learning capabilities when detecting such targets. This also exacerbates the loss of features during the feature extraction and fusion process, significantly compromising the network’s detection performance on traffic targets. This paper presents a novel methodology by which to overcome the concerns above, namely HRYNet. Firstly, a dual fusion gradual pyramid structure (DFGPN) is introduced, which employs a two-stage gradient fusion strategy to enhance the generation of more comprehensive multi-scale high-level semantic information, strengthen the interconnection between non-adjacent feature layers, and reduce the information gap that exists between them. HRYNet introduces an anti-interference feature extraction module, the residual multi-head self-attention mechanism (RMA). RMA enhances the target information by implementing a characteristic channel weighting policy, thereby reducing background interference and improving the attention capability of the network. Finally, the detection performance of HRYNet was evaluated by utilizing three datasets: the horizontally collected dataset BDD1000K, the UAV high-altitude dataset Visdrone, and a custom dataset. Experimental results demonstrate that HRYNet achieves a higher mAP_0.5 compared with YOLOv8s on the three datasets, with increases of 10.8%, 16.7%, and 5.5%, respectively. To optimize HRYNet for mobile devices, this study presents Lightweight HRYNet (LHRYNet), which effectively reduces the number of model parameters by 2 million. The results demonstrate that LHRYNet outperforms YOLOv8s in terms of mAP_0.5, with improvements of 6.7%, 10.9%, and 2.5% observed on the three datasets, respectively. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

22 pages, 10627 KiB

Open AccessArticle

ScanGuard-YOLO: Enhancing X-ray Prohibited Item Detection with Significant Performance Gains

by Xianning Huang and Yaping Zhang

Sensors 2024, 24(1), 102; https://doi.org/10.3390/s24010102 - 24 Dec 2023

Cited by 1 | Viewed by 781

To address the problem of low recall rate in the detection of prohibited items in X-ray images due to the severe object occlusion and complex background, an X-ray prohibited item detection network, ScanGuard-YOLO, based on the YOLOv5 architecture, is proposed to effectively improve [...] Read more.

To address the problem of low recall rate in the detection of prohibited items in X-ray images due to the severe object occlusion and complex background, an X-ray prohibited item detection network, ScanGuard-YOLO, based on the YOLOv5 architecture, is proposed to effectively improve the model’s recall rate and the comprehensive metric F1 score. Firstly, the RFB-s module was added to the end part of the backbone, and dilated convolution was used to increase the receptive field of the backbone network to better capture global features. In the neck section, the efficient RepGFPN module was employed to fuse multiscale information from the backbone output. This aimed to capture details and contextual information at various scales, thereby enhancing the model’s understanding and representation capability of the object. Secondly, a novel detection head was introduced to unify scale-awareness, spatial-awareness, and task-awareness altogether, which significantly improved the representation ability of the object detection heads. Finally, the bounding box regression loss function was defined as the WIOUv3 loss, effectively balancing the contribution of low-quality and high-quality samples to the loss. ScanGuard-YOLO was tested on OPIXray and HiXray datasets, showing significant improvements compared to the baseline model. The mean average precision (mAP@0.5) increased by 2.3% and 1.6%, the recall rate improved by 4.5% and 2%, and the F1 score increased by 2.3% and 1%, respectively. The experimental results demonstrate that ScanGuard-YOLO effectively enhances the detection capability of prohibited items in complex backgrounds and exhibits broad prospects for application. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

17 pages, 6588 KiB

Open AccessArticle

Autoencoder-Based Visual Anomaly Localization for Manufacturing Quality Control

by Devang Mehta and Noah Klarmann

Mach. Learn. Knowl. Extr. 2024, 6(1), 1-17; https://doi.org/10.3390/make6010001 - 21 Dec 2023

Cited by 1 | Viewed by 1680

Manufacturing industries require the efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlled product quality with high precision. In general, automation based on computer vision is a promising [...] Read more.

Manufacturing industries require the efficient and voluminous production of high-quality finished goods. In the context of Industry 4.0, visual anomaly detection poses an optimistic solution for automatically controlled product quality with high precision. In general, automation based on computer vision is a promising solution to prevent bottlenecks at the product quality checkpoint. We considered recent advancements in machine learning to improve visual defect localization, but challenges persist in obtaining a balanced feature set and database of the wide variety of defects occurring in the production line. Hence, this paper proposes a defect localizing autoencoder with unsupervised class selection by clustering with k-means the features extracted from a pretrained VGG16 network. Moreover, the selected classes of defects are augmented with natural wild textures to simulate artificial defects. The study demonstrates the effectiveness of the defect localizing autoencoder with unsupervised class selection for improving defect detection in manufacturing industries. The proposed methodology shows promising results with precise and accurate localization of quality defects on melamine-faced boards for the furniture industry. Incorporating artificial defects into the training data shows significant potential for practical implementation in real-world quality control scenarios. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

17 pages, 3683 KiB

Open AccessArticle

A Weakly Supervised Semantic Segmentation Model of Maize Seedlings and Weed Images Based on Scrawl Labels

by Lulu Zhao, Yanan Zhao, Ting Liu and Hanbing Deng

Sensors 2023, 23(24), 9846; https://doi.org/10.3390/s23249846 - 15 Dec 2023

Viewed by 632

The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To [...] Read more.

The task of semantic segmentation of maize and weed images using fully supervised deep learning models requires a large number of pixel-level mask labels, and the complex morphology of the maize and weeds themselves can further increase the cost of image annotation. To solve this problem, we proposed a Scrawl Label-based Weakly Supervised Semantic Segmentation Network (SL-Net). SL-Net consists of a pseudo label generation module, encoder, and decoder. The pseudo label generation module converts scrawl labels into pseudo labels that replace manual labels that are involved in network training, improving the backbone network for feature extraction based on the DeepLab-V3+ model and using a migration learning strategy to optimize the training process. The results show that the intersection over union of the pseudo labels that are generated by the pseudo label module with the ground truth is 83.32%, and the cosine similarity is 93.55%. In the semantic segmentation testing of SL-Net for image seedling of maize plants and weeds, the mean intersection over union and average precision reached 87.30% and 94.06%, which is higher than the semantic segmentation accuracy of DeepLab-V3+ and PSPNet under weakly and fully supervised learning conditions. We conduct experiments to demonstrate the effectiveness of the proposed method. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

13 pages, 4444 KiB

Open AccessArticle

Go-Game Image Recognition Based on Improved Pix2pix

by Yanxia Zheng and Xiyuan Qian

J. Imaging 2023, 9(12), 273; https://doi.org/10.3390/jimaging9120273 - 07 Dec 2023

Viewed by 1394

Go is a game that can be won or lost based on the number of intersections surrounded by black or white pieces. The traditional method is a manual counting method, which is time-consuming and error-prone. In addition, the generalization of the current Go-image-recognition [...] Read more.

Go is a game that can be won or lost based on the number of intersections surrounded by black or white pieces. The traditional method is a manual counting method, which is time-consuming and error-prone. In addition, the generalization of the current Go-image-recognition methods is poor, and accuracy needs to be further improved. To solve these problems, a Go-game image recognition based on an improved pix2pix was proposed. Firstly, a channel-coordinate mixed-attention (CCMA) mechanism was designed by combining channel attention and coordinate attention effectively; therefore, the model could learn the target feature information. Secondly, in order to obtain the long-distance contextual information, a deep dilated-convolution (DDC) module was proposed, which densely linked the dilated convolution with different dilated rates. The experimental results showed that compared with other existing Go-image-recognition methods, such as DenseNet, VGG-16, and Yolo v5, the proposed method could effectively improve the generalization ability and accuracy of a Go-image-recognition model, and the average accuracy rate was over 99.99%. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

17 pages, 11761 KiB

Open AccessArticle

RepECN: Making ConvNets Better Again for Efficient Image Super-Resolution

by Qiangpu Chen, Jinghui Qin and Wushao Wen

Sensors 2023, 23(23), 9575; https://doi.org/10.3390/s23239575 - 02 Dec 2023

Viewed by 777

Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods [...] Read more.

Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods often suffer from high computation costs and model storage overhead, making them hard to meet the requirements in practical application scenarios. In practical scenarios, an SR model should reconstruct an image with high quality and fast inference. To handle this issue, we propose a novel CNN-based Efficient Residual ConvNet enhanced with structural Re-parameterization (RepECN) for a better trade-off between performance and efficiency. A stage-to-block hierarchical architecture design paradigm inspired by ViT is utilized to keep the state-of-the-art performance, while the efficiency is ensured by abandoning the time-consuming Multi-Head Self-Attention (MHSA) and by re-designing the block-level modules based on CNN. Specifically, RepECN consists of three structural modules: a shallow feature extraction module, a deep feature extraction, and an image reconstruction module. The deep feature extraction module comprises multiple ConvNet Stages (CNS), each containing 6 Re-Parameterization ConvNet Blocks (RepCNB), a head layer, and a residual connection. The RepCNB utilizes larger kernel convolutions rather than MHSA to enhance the capability of learning long-range dependence. In the image reconstruction module, an upsampling module consisting of nearest-neighbor interpolation and pixel attention is deployed to reduce parameters and maintain reconstruction performance, while bicubic interpolation on another branch allows the backbone network to focus on learning high-frequency information. The extensive experimental results on multiple public benchmarks show that our RepECN can achieve 2.5∼5× faster inference than the state-of-the-art ViT-based SR model with better or competitive super-resolving performance, indicating that our RepECN can reconstruct high-quality images with fast inference. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

22 pages, 2387 KiB

Open AccessArticle

Android Malware Classification Based on Fuzzy Hashing Visualization

by Horacio Rodriguez-Bazan, Grigori Sidorov and Ponciano Jorge Escamilla-Ambrosio

Mach. Learn. Knowl. Extr. 2023, 5(4), 1826-1847; https://doi.org/10.3390/make5040088 - 28 Nov 2023

Cited by 1 | Viewed by 1488

The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network [...] Read more.

The proliferation of Android-based devices has brought about an unprecedented surge in mobile application usage, making the Android ecosystem a prime target for cybercriminals. In this paper, a new method for Android malware classification is proposed. The method implements a convolutional neural network for malware classification using images. The research presents a novel approach to transforming the Android Application Package (APK) into a grayscale image. The image creation utilizes natural language processing techniques for text cleaning, extraction, and fuzzy hashing to represent the decompiled code from the APK in a set of hashes after preprocessing, where the image is composed of n fuzzy hashes that represent an APK. The method was tested on an Android malware dataset with 15,493 samples of five malware types. The proposed method showed an increase in accuracy compared to others in the literature, achieving up to 98.24% in the classification task. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

22 pages, 1471 KiB

Open AccessArticle

Attention-Assisted Feature Comparison and Feature Enhancement for Class-Agnostic Counting

by Liang Dong, Yian Yu, Di Zhang and Yan Huo

Sensors 2023, 23(22), 9126; https://doi.org/10.3390/s23229126 - 11 Nov 2023

Viewed by 847

In this study, we address the class-agnostic counting (CAC) challenge, aiming to count instances in a query image, using just a few exemplars. Recent research has shifted towards few-shot counting (FSC), which involves counting previously unseen object classes. We present ACECount, an FSC [...] Read more.

In this study, we address the class-agnostic counting (CAC) challenge, aiming to count instances in a query image, using just a few exemplars. Recent research has shifted towards few-shot counting (FSC), which involves counting previously unseen object classes. We present ACECount, an FSC framework that combines attention mechanisms and convolutional neural networks (CNNs). ACECount identifies query image–exemplar similarities, using cross-attention mechanisms, enhances feature representations with a feature attention module, and employs a multi-scale regression head, to handle scale variations in CAC. ACECount’s experiments on the FSC-147 dataset exhibited the expected performance. ACECount achieved a reduction of 0.3 in the mean absolute error (MAE) on the validation set and a reduction of 0.26 on the test set of FSC-147, compared to previous methods. Notably, ACECount also demonstrated convincing performance in class-specific counting (CSC) tasks. Evaluation on crowd and vehicle counting datasets revealed that ACECount surpasses FSC algorithms like GMN, FamNet, SAFECount, LOCA, and SPDCN, in terms of performance. These results highlight the robust dataset generalization capabilities of our proposed algorithm. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

20 pages, 10786 KiB

Open AccessArticle

A Binary Fast Image Registration Method Based on Fusion Information

by Huaidan Liang, Chenglong Liu, Xueguang Li and Lina Wang

Electronics 2023, 12(21), 4475; https://doi.org/10.3390/electronics12214475 - 31 Oct 2023

Viewed by 695

In the field of airborne aerial imaging, image stitching is often used to expand the field of view. Registration is the foundation of aerial image stitching and directly affects its success and quality. This article develops a fast binary image registration method based [...] Read more.

In the field of airborne aerial imaging, image stitching is often used to expand the field of view. Registration is the foundation of aerial image stitching and directly affects its success and quality. This article develops a fast binary image registration method based on the characteristics of airborne aerial imaging. This method first integrates aircraft parameters and calculates the ground range of the image for coarse registration. Then, based on the characteristics of FAST (Features from Accelerated Segment Test), a new sampling method, named Weighted Angular Diffusion Radial Sampling (WADRS), and matching method are designed. The method proposed in this article can achieve fast registration while ensuring registration accuracy, with a running speed that is approximately four times faster than SURF (Speed Up Robust Features). Additionally, there is no need to manually select any control points before registration. The results indicate that the proposed method can effectively complete remote sensing image registration from different perspectives. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

14 pages, 8937 KiB

Open AccessArticle

A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter

by Haitao Xu, Chengming Liu, Shuya Duan, Liangpin Ren, Guozhen Cheng and Bing Hao

Appl. Sci. 2023, 13(20), 11386; https://doi.org/10.3390/app132011386 - 17 Oct 2023

Viewed by 877

Fabric inspection is critical in fabric manufacturing. Automatic detection of fabric defects in the textile industry has always been an important research field. Previously, manual visual inspection was commonly used; however, there were drawbacks such as high labor costs, slow detection speed, and [...] Read more.

Fabric inspection is critical in fabric manufacturing. Automatic detection of fabric defects in the textile industry has always been an important research field. Previously, manual visual inspection was commonly used; however, there were drawbacks such as high labor costs, slow detection speed, and high error rates. Recently, many defect detection methods based on deep learning have been proposed. However, problems need to be solved in the existing methods, such as detection accuracy and interference of complex background textures. In this paper, we propose an efficient segmentation algorithm that combines traditional operators with deep learning networks to alleviate the existing problems. Specifically, we introduce a Gabor filter into the model, which provides the unique advantage of extracting low-level texture features to solve the problem of texture interference and enable the algorithm to converge quickly in the early stages of training. Furthermore, we design a U-shaped architecture that is not completely symmetrical, making model training easier. Meanwhile, multi-stage result fusion is proposed for precise location of defects. The design of this framework significantly improves the detection accuracy and effectively breaks through the limitations of transformer-based models. Experimental results show that on a dataset with one class, a small amount of data, and complex sample background texture, our method achieved 90.03% and 33.70% in ACC and IoU, respectively, which is almost 10% higher than other previous state of the art models. Experimental results based on three different fabric datasets consistently show that the proposed model has excellent performance and great application potential in the industrial field. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

21 pages, 6594 KiB

Open AccessArticle

Enhanced YOLOv5: An Efficient Road Object Detection Method

by Hao Chen, Zhan Chen and Hang Yu

Sensors 2023, 23(20), 8355; https://doi.org/10.3390/s23208355 - 10 Oct 2023

Viewed by 2047

Accurate identification of road objects is crucial for achieving intelligent traffic systems. However, developing efficient and accurate road object detection methods in complex traffic scenarios has always been a challenging task. The objective of this study was to improve the target detection algorithm [...] Read more.

Accurate identification of road objects is crucial for achieving intelligent traffic systems. However, developing efficient and accurate road object detection methods in complex traffic scenarios has always been a challenging task. The objective of this study was to improve the target detection algorithm for road object detection by enhancing the algorithm’s capability to fuse features of different scales and levels, thereby improving the accurate identification of objects in complex road scenes. We propose an improved method called the Enhanced YOLOv5 algorithm for road object detection. By introducing the Bidirectional Feature Pyramid Network (BiFPN) into the YOLOv5 algorithm, we address the challenges of multi-scale and multi-level feature fusion and enhance the detection capability for objects of different sizes. Additionally, we integrate the Convolutional Block Attention Module (CBAM) into the existing YOLOv5 model to enhance its feature representation capability. Furthermore, we employ a new non-maximum suppression technique called Distance Intersection Over Union (DIOU) to effectively address issues such as misjudgment and duplicate detection when significant overlap occurs between bounding boxes. We use mean Average Precision (mAP) and Precision (P) as evaluation metrics. Finally, experimental results on the BDD100K dataset demonstrate that the improved YOLOv5 algorithm achieves a 1.6% increase in object detection mAP, while the P value increases by 5.3%, effectively improving the accuracy and robustness of road object recognition. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

17 pages, 4295 KiB

Open AccessArticle

Few-Shot Air Object Detection Network

by Wei Cai, Xin Wang, Xinhao Jiang, Zhiyong Yang, Xingyu Di and Weijie Gao

Electronics 2023, 12(19), 4133; https://doi.org/10.3390/electronics12194133 - 04 Oct 2023

Viewed by 729

Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a [...] Read more.

Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a multi-scale attention mechanism (MAM) to deeply fuse the W- and H-dimension features extracted from the channel dimension and the local and global features extracted from the spatial dimension with the object features to improve the network’s performance when detecting air objects. Second, the neck network is innovated based on the path aggregation network (PANet), resulting in an improved path aggregation network (IPANet). Our proposed network reduces the information lost during feature transfer by introducing a jump connection, utilizes sparse connection convolution, strengthens feature extraction abilities at all scales, and improves the discriminative properties of air object features at all scales. Finally, we propose a multi-scale regional proposal network (MRPN) that can establish multiple RPNs based on the scale types of the output features, utilizing adaptive convolutions to effectively extract object features at each scale and enhancing the ability to process multi-scale information. The experimental results showed that our proposed method exhibits good performance and generalization, especially in the 1-, 2-, 3-, 5-, and 10-shot experiments, with average accuracies of 33.2%, 36.8%, 43.3%, 47.2%, and 60.4%, respectively. The FADNet solves the problems posed by the few-shot characteristics and multi-scale characteristics of air objects, as well as improving the detection capabilities of the air object detection model. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

23 pages, 17933 KiB

Open AccessArticle

Dual Histogram Equalization Algorithm Based on Adaptive Image Correction

by Bowen Ye, Sun Jin, Bing Li, Shuaiyu Yan and Deng Zhang

Appl. Sci. 2023, 13(19), 10649; https://doi.org/10.3390/app131910649 - 25 Sep 2023

Cited by 1 | Viewed by 884

For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization [...] Read more.

For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization algorithm, a dual histogram equalization algorithm based on adaptive image correction (AICHE) is proposed. To prevent luminance shifts from occurring during image equalization, the AICHE algorithm protects the average luminance of the input image by improving upon the Otsu algorithm, enabling it to split the histogram. Then, the AICHE algorithm uses the local grayscale correction algorithm to correct the grayscale to prevent the image over-enhancement and gray-level merging problems that arise with the traditional algorithm. It is experimentally verified that the AICHE algorithm can significantly improve the histogram segmentation effect and enhance the contrast and detail information while protecting the average brightness of the input image, and thus the image quality is significantly increased. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

15 pages, 4788 KiB

Open AccessArticle

Saliency-Driven Hand Gesture Recognition Incorporating Histogram of Oriented Gradients (HOG) and Deep Learning

by Farzaneh Jafari and Anup Basu

Sensors 2023, 23(18), 7790; https://doi.org/10.3390/s23187790 - 11 Sep 2023

Cited by 1 | Viewed by 890

Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color [...] Read more.

Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color information is an efficient method to determine hand regions, complicated image backgrounds adversely affect recognizing the exact area of the hand shape. Some valuable features like saliency maps, histogram of oriented gradients (HOG), Canny edge detection, and skin color help us maximize the accuracy of hand shape recognition. Considering these features, we proposed an efficient hand posture detection model that improves the test accuracy results to over 99% on the NUS Hand Posture Dataset II and more than 97% on the hand gesture dataset with different challenging backgrounds. In addition, we added noise to around 60% of our datasets. Replicating our experiment, we achieved more than 98% and nearly 97% accuracy on NUS and hand gesture datasets, respectively. Experiments illustrate that the saliency method with HOG has stable performance for a wide range of images with complex backgrounds having varied hand colors and sizes. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

19 pages, 21026 KiB

Open AccessArticle

Detection of Wheat Yellow Rust Disease Severity Based on Improved GhostNetV2

by Zhihui Li, Xin Fang, Tong Zhen and Yuhua Zhu

Appl. Sci. 2023, 13(17), 9987; https://doi.org/10.3390/app13179987 - 04 Sep 2023

Cited by 5 | Viewed by 1184

Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and [...] Read more.

Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and late stages of onset, the symptoms of the disease are obvious, though the severity is difficult to distinguish. A traditional deep learning network model has a large number of parameters, a large amount of calculation, a long time for model training, and high resource consumption, making it difficult to transplant to mobile and edge terminals. To address the above issues, this study proposes an optimized GhostNetV2 approach. First, to increase communication between groups, a channel rearrangement operation is performed on the output of the Ghost module. Then, the first five G-bneck layers of the source model GhostNetV2 are replaced with Fused-MBConv to accelerate model training. Finally, to further improve the model’s identification of diseases, the source attention mechanism SE is replaced by ECA. After experimental comparison, the improved algorithm shortens the training time by 37.49%, and the accuracy rate reaches 95.44%, which is 2.24% higher than the GhostNetV2 algorithm. The detection accuracy and speed have major improvements compared with other lightweight model algorithms. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

19 pages, 6677 KiB

Open AccessArticle

A Long Skip Connection for Enhanced Color Selectivity in CNN Architectures

by Oscar Sanchez-Cesteros, Mariano Rincon, Margarita Bachiller and Sonia Valladares-Rodriguez

Sensors 2023, 23(17), 7582; https://doi.org/10.3390/s23177582 - 31 Aug 2023

Viewed by 889

Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation [...] Read more.

Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation space. Inspired by the direct connection between the LGN and V4, which allows V4 to handle low-level information closer to the trichromatic input in addition to processed information that comes from V2/V3, we propose the addition of a long skip connection (LSC) between the first and last blocks of the feature extraction stage to allow deeper parts of the network to receive information from shallower layers. This type of connection improves classification accuracy by combining simple-visual and complex-abstract features to create more color-selective ones. We have applied this strategy to classic CNN architectures and quantitatively and qualitatively analyzed the improvement in accuracy while focusing on color selectivity. The results show that, in general, skip connections improve accuracy, but LSC improves it even more and enhances the color selectivity of the original CNN architectures. As a side result, we propose a new color representation procedure for organizing and filtering feature maps, making their visualization more manageable for qualitative color selectivity analysis. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

19 pages, 2992 KiB

Open AccessArticle

MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection

by Haiping Zhang, Fuxing Zhou, Conghao Ma, Dongjing Wang and Wanjun Zhang

Sensors 2023, 23(17), 7563; https://doi.org/10.3390/s23177563 - 31 Aug 2023

Viewed by 821

Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it [...] Read more.

Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it is necessary to capture information with a richer temporal distribution as much as possible. In this paper, we propose a dual-stream model that can model contextual information at multiple temporal scales. First, the input video is divided into two resolution streams, followed by a Multi-Resolution Context Aggregation module to capture multi-scale temporal information. Additionally, an Information Enhancement module is added after the high-resolution input stream to model both long-range and short-range contexts. Finally, the outputs of the two modules are merged to obtain features with rich temporal information for action localization and classification. We conducted experiments on three datasets to evaluate the proposed approach. On ActivityNet-v1.3, an average mAP (mean Average Precision) of 32.83% was obtained. On Charades, the best performance was obtained, with an average mAP of 27.3%. On TSU (Toyota Smarthome Untrimmed), an average mAP of 33.1% was achieved. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

16 pages, 9000 KiB

Open AccessArticle

Infrared Dim and Small Target Sequence Dataset Generation Method Based on Generative Adversarial Networks

by Leihong Zhang, Weihong Lin, Zimin Shen, Dawei Zhang, Banglian Xu, Kaimin Wang and Jian Chen

Electronics 2023, 12(17), 3625; https://doi.org/10.3390/electronics12173625 - 28 Aug 2023

Cited by 1 | Viewed by 983

With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper [...] Read more.

With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper proposes a method for generating infrared dim and small target sequence datasets based on generative adversarial networks (GANs). Specifically, first, the improved deep convolutional generative adversarial network (DCGAN) model is used to generate clear images of the infrared sky background. Then, target–background sequence images are constructed using multi-scale feature extraction and improved conditional generative adversarial networks. This method fully considers the infrared characteristics of the target and the background, which can achieve effective expansion of the image data and provide a test set for the infrared small target detection and recognition algorithm. In addition, the classifier’s performance can be improved by expanding the training set, which enhances the accuracy and effect of infrared dim and small target detection based on deep learning. After experimental evaluation, the dataset generated by this method is similar to the real infrared dataset, and the model detection accuracy can be improved after training with the latest deep learning model. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

23 pages, 9230 KiB

Open AccessArticle

Unification of Road Scene Segmentation Strategies Using Multistream Data and Latent Space Attention

by August J. Naudé and Herman C. Myburgh

Sensors 2023, 23(17), 7355; https://doi.org/10.3390/s23177355 - 23 Aug 2023

Viewed by 822

Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity [...] Read more.

Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity of achieving human-level scene understanding, at which successful segmentation of road scene elements can be achieved with a mean intersection over union score close to 1.0. There is a need for more of a unified approach to road scene segmentation for use in self-driving systems. Previous works have demonstrated how deep learning methods can be combined to improve the segmentation and perception performance of road scene understanding systems. This paper proposes a novel segmentation system that uses fully connected networks, attention mechanisms, and multiple-input data stream fusion to improve segmentation performance. Results show comparable performance compared to previous works, with a mean intersection over union of 87.4% on the Cityscapes dataset. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

18 pages, 5485 KiB

Open AccessArticle

Vision Transformer Customized for Environment Detection and Collision Prediction to Assist the Visually Impaired

by Nasrin Bayat, Jong-Hwan Kim, Renoa Choudhury, Ibrahim F. Kadhim, Zubaidah Al-Mashhadani, Mark Aldritz Dela Virgen, Reuben Latorre, Ricardo De La Paz and Joon-Hyuk Park

J. Imaging 2023, 9(8), 161; https://doi.org/10.3390/jimaging9080161 - 15 Aug 2023

Cited by 2 | Viewed by 1423

This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the [...] Read more.

This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the user. Semantic segmentation and the algorithms developed in this work provide a means to generate a trajectory vector of all identified objects from the vision transformer and to detect objects that are likely to intersect with the user’s walking path. Audio and vibrotactile feedback modules are integrated to convey collision warning through multimodal feedback. The dataset used to create the model was captured from both indoor and outdoor settings under different weather conditions at different times across multiple days, resulting in 27,867 photos consisting of 24 different classes. Classification results showed good performance (95% accuracy), supporting the efficacy and reliability of the proposed model. The design and control methods of the multimodal feedback modules for collision warning are also presented, while the experimental validation concerning their usability and efficiency stands as an upcoming endeavor. The demonstrated performance of the vision transformer and the presented algorithms in conjunction with the multimodal feedback modules show promising prospects of its feasibility and applicability for the navigation assistance of individuals with vision impairment. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

18 pages, 4489 KiB

Open AccessArticle

Center Deviation Measurement of Color Contact Lenses Based on a Deep Learning Model and Hough Circle Transform

by Gi-nam Kim, Sung-hoon Kim, In Joo, Gui-bae Kim and Kwan-hee Yoo

Sensors 2023, 23(14), 6533; https://doi.org/10.3390/s23146533 - 19 Jul 2023

Cited by 1 | Viewed by 1136

Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. [...] Read more.

Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. Measuring the extent of deviation of the CA from the center point is necessary to detect these CD defects. In this study, we propose a method that utilizes image processing and analysis techniques for detecting such defects. Our approach involves employing semantic segmentation to simplify the image and reduce noise interference and utilizing the Hough circle transform algorithm to measure the deviation of the center point of the CA in color contact lenses. Experimental results demonstrated that our proposed method achieved a 71.2% reduction in error compared with existing research methods. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

28 pages, 4274 KiB

Open AccessArticle

Improving Small-Scale Human Action Recognition Performance Using a 3D Heatmap Volume

by Lin Yuan, Zhen He, Qiang Wang, Leiyang Xu and Xiang Ma

Sensors 2023, 23(14), 6364; https://doi.org/10.3390/s23146364 - 13 Jul 2023

Cited by 1 | Viewed by 1432

In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning [...] Read more.

In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning methods, which have greater practical significance. To address this gap, we propose a novel approach based on heatmap-based pseudo videos and a unified, general model applicable to all modality datasets. Leveraging anthropometric kinematics as prior information, we extract common human motion features among datasets through an ad hoc pre-trained model. To overcome joint mismatch issues, we partition the human skeleton into five parts, a simple yet effective technique for information sharing. Our approach is evaluated on two datasets, including the public Nursing Activities and our self-built Tai Chi Action dataset. Results from linear evaluation protocol and fine-tuned evaluation demonstrate that our pre-trained model effectively captures common motion features among human actions and achieves steady and precise accuracy across all training settings, while mitigating network overfitting. Notably, our model outperforms state-of-the-art models in recognition accuracy when fusing joint and limb modality features along the channel dimension. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

16 pages, 3234 KiB

Open AccessArticle

Unsupervised Vehicle Re-Identification Based on Cross-Style Semi-Supervised Pre-Training and Feature Cross-Division

by Guowei Zhan, Qi Wang, Weidong Min, Qing Han, Haoyu Zhao and Zitai Wei

Electronics 2023, 12(13), 2931; https://doi.org/10.3390/electronics12132931 - 03 Jul 2023

Viewed by 774

Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting [...] Read more.

Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting in a mismatch between style transfer data and hard labels; (2) pseudo label assignment in the fine-tuning stage is solely determined by similarity measures of global features using clustering algorithms, leading to inevitable label noise in generated pseudo labels. To tackle these issues, this paper proposes an unsupervised vehicle re-identification framework based on cross-style semi-supervised pre-training and feature cross-division. The framework consists of two parts: cross-style semi-supervised pre-training (CSP) and feature cross-division (FCD) for model fine-tuning. The CSP module generates style transfer data containing source domain content and target domain style using a style transfer network, and then pre-trains the model in a semi-supervised manner using both source domain and style transfer data. A pseudo-label reassignment strategy is designed to generate soft labels assigned to the style transfer data. The FCD module obtains feature partitions through a novel interactive division to reduce the dependence of pseudo-labels on global features, and the final similarity measurement combines the results of partition features and global features. Experimental results on the VehicleID and VeRi-776 datasets show that the proposed method outperforms existing unsupervised vehicle re-identification methods. Compared with the last best method on each dataset, the method proposed in this paper improves the mAP by 0.63% and the Rank-1 by 0.73% on the three sub-datasets of VehicleID on average, and it improves mAP by 0.9% and Rank-1 by 1% on VeRi-776 dataset. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

18 pages, 595 KiB

Open AccessArticle

Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model

by Ali Areshey and Hassan Mathkour

Sensors 2023, 23(11), 5232; https://doi.org/10.3390/s23115232 - 31 May 2023

Cited by 2 | Viewed by 2621

Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is [...] Read more.

Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is to determine an author’s attitude toward a subject or the overall tone of a document. There is a huge collection of studies that make an effort to predict how useful online reviews will be and have produced conflicting results on the efficacy of different methodologies. Furthermore, many of the current solutions employ manual feature generation and conventional shallow learning methods, which restrict generalization. As a result, the goal of this research is to develop a general approach using transfer learning by applying the “BERT (Bidirectional Encoder Representations from Transformers)”-based model. The efficiency of BERT classification is then evaluated by comparing it with similar machine learning techniques. In the experimental evaluation, the proposed model demonstrated superior performance in terms of outstanding prediction and high accuracy compared to earlier research. Comparative tests conducted on positive and negative Yelp reviews reveal that fine-tuned BERT classification performs better than other approaches. In addition, it is observed that BERT classifiers using batch size and sequence length significantly affect classification performance. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

22 pages, 5873 KiB

Open AccessArticle

Lightweight Multiscale CNN Model for Wheat Disease Detection

by Xin Fang, Tong Zhen and Zhihui Li

Appl. Sci. 2023, 13(9), 5801; https://doi.org/10.3390/app13095801 - 08 May 2023

Cited by 8 | Viewed by 2312

Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models [...] Read more.

Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models are difficult to apply to mobile devices because they have large parameters, and high computation and resource requirements. To address these issues, this paper combines the residual module and the inception module to construct a lightweight multiscale CNN model, which introduces the CBAM and ECA modules into the residual block, enhances the model’s attention to diseases, and reduces the influence of complex backgrounds on disease recognition. The proposed method has an accuracy rate of 98.7% on the test dataset, which is higher than classic convolutional neural networks such as AlexNet, VGG16, and InceptionresnetV2 and lightweight models such as MobileNetV3 and EfficientNetb0. The proposed model has superior performance and can be applied to mobile terminals to quickly identify wheat diseases. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

19 pages, 8938 KiB

Open AccessArticle

Development of an Accurate and Automated Quality Inspection System for Solder Joints on Aviation Plugs Using Fine-Tuned YOLOv5 Models

by Junwei Sha, Junpu Wang, Huanran Hu, Yongqiang Ye and Guili Xu

Appl. Sci. 2023, 13(9), 5290; https://doi.org/10.3390/app13095290 - 23 Apr 2023

Cited by 7 | Viewed by 1575

The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an [...] Read more.

The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an accurate and automated detection system using fine-tuned YOLOv5 models is developed in this paper. Firstly, we design an intelligent image acquisition system to obtain the high-resolution image of each solder joint automatically. Then, a two-phase approach is proposed for fast and accurate weld quality detection. In the first phase, a fine-tuned YOLOv5 model is applied to extract the region of interest (ROI), i.e., the row of solder joints to be inspected, within the whole image. With the sliding platform, the ROI is automatically moved to the center of the image to enhance its imaging clarity. Subsequently, another fine-tuned YOLOv5 model takes this adjusted ROI as input and realizes quality assessment. Finally, a concise and easy-to-use GUI has been designed and deployed in real production lines. Experimental results in the actual production line show that the proposed method can achieve a detection accuracy of more than 97.5% with a detection speed of about 0.1 s, which meets the needs of actual production Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

15 pages, 7610 KiB

Open AccessArticle

FM-STDNet: High-Speed Detector for Fast-Moving Small Targets Based on Deep First-Order Network Architecture

by Xinyu Hu, Defeng Kong, Xiyang Liu, Junwei Zhang and Daode Zhang

Electronics 2023, 12(8), 1829; https://doi.org/10.3390/electronics12081829 - 12 Apr 2023

Cited by 2 | Viewed by 1070

Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time [...] Read more.

Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time detection of the fast-moving printed circuit board (PCB) tiny targets is investigated. This task is very challenging because PCB defects are usually small compared to the whole PCB board, and due to the pursuit of production efficiency, the actual production PCB moving speed is usually very fast, which puts higher requirements on the real-time of intelligent systems. To this end, a new model of FM-STDNet (Fast Moving Small Target Detection Network) is proposed based on the well-known deep learning detector YOLO (You Only Look Once) series model. First, based on the SPPNet (Spatial Pyramid Pooling Networks) network, a new SPPFCSP (Spatial Pyramid Pooling Fast Cross Stage Partial Network) spatial pyramid pooling module is designed to adapt to the extraction of different scale size features of different size input images, which helps retain the high semantic information of smaller features; then, the anchor-free mode is introduced to directly classify the regression prediction information and do the structural reparameterization construction to design a new high-speed prediction head RepHead to further improve the operation speed of the detector. The experimental results show that the proposed detector achieves 99.87% detection accuracy at the fastest speed compared to state-of-the-art depth detectors such as YOLOv3, Faster R-CNN, and TDD-Net in the fast-moving PCB surface defect detection task. The new model of FM-STDNet provides an effective reference for the fast-moving small target detection task. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

20 pages, 19138 KiB

Open AccessArticle

Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes

by Shuqi Fang, Bin Zhang and Jingyu Hu

Sensors 2023, 23(8), 3853; https://doi.org/10.3390/s23083853 - 10 Apr 2023

Cited by 7 | Viewed by 3765

Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in [...] Read more.

Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

15 pages, 5355 KiB

Open AccessArticle

Insights into Batch Selection for Event-Camera Motion Estimation

by Juan L. Valerdi, Chiara Bartolozzi and Arren Glover

Sensors 2023, 23(7), 3699; https://doi.org/10.3390/s23073699 - 03 Apr 2023

Cited by 1 | Viewed by 1400

Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras [...] Read more.

Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras (where a data packet or frame is emitted at a fixed temporal rate). As such, it is not trivial to define a priori how to group/accumulate events in a way that is sufficient for computation. The suitable number of events can greatly vary for different environments, motion patterns, and tasks. In this paper, we use neural networks for rotational motion estimation as a scenario to investigate the appropriate selection of event batches to populate input tensors. Our results show that batch selection has a large impact on the results: training should be performed on a wide variety of different batches, regardless of the batch selection method; a simple fixed-time window is a good choice for inference with respect to fixed-count batches, and it also demonstrates comparable performance to more complex methods. Our initial hypothesis that a minimal amount of events is required to estimate motion (as in contrast maximization) is not valid when estimating motion with a neural network. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

10 pages, 1549 KiB

Open AccessArticle

Self-Supervised Facial Motion Representation Learning via Contrastive Subclips

by Zheng Sun, Shad A. Torrie, Andrew W. Sumsion and Dah-Jye Lee

Electronics 2023, 12(6), 1369; https://doi.org/10.3390/electronics12061369 - 13 Mar 2023

Viewed by 1127

Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a [...] Read more.

Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a matched facial appearance and make a facial motion the same as the enrollment for authentication. Some recent research papers have discussed the benefits of this new biometric technology and reported promising results for both static and dynamic facial motion verification tasks. Our work extends the existing approaches and introduces compound facial actions, which contain more than one dominant facial action in one utterance. We propose a new self-supervised pretraining method called contrastive subclips that improves the model performance with these more complex and secure facial motions. The experimental results show that the contrastive subclips method improves upon the baseline approaches, and the model performance for test data can reach 89.7% average precision. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

30 pages, 752 KiB

Open AccessReview

The Challenges of Recognizing Offline Handwritten Chinese: A Technical Review

by Lu Shen, Bidong Chen, Jianjing Wei, Hui Xu, Su-Kit Tang and Silvia Mirri

Appl. Sci. 2023, 13(6), 3500; https://doi.org/10.3390/app13063500 - 09 Mar 2023

Cited by 4 | Viewed by 2770

Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination [...] Read more.

Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination with other domain knowledge, offline handwritten Chinese recognition has gained breakthroughs in methods and performance in recent years. However, there have yet to be articles that provide a technical review of this field since 2016. In light of this, this paper reviews the research progress and challenges of offline handwritten Chinese recognition based on traditional techniques, deep learning methods, methods combining deep learning with traditional techniques, and knowledge from other areas from 2016 to 2022. Firstly, it introduces the research background and status of handwritten Chinese recognition, standard datasets, and evaluation metrics. Secondly, a comprehensive summary and analysis of offline HCCR and offline HCTR approaches during the last seven years is provided, along with an explanation of their concepts, specifics, and performances. Finally, the main research problems in this field over the past few years are presented. The challenges still exist in offline handwritten Chinese recognition are discussed, aiming to inspire future research work. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

18 pages, 3434 KiB

Open AccessArticle

Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images

by Agus Nursikuwagus, Rinaldi Munir and Masayu Leylia Khodra

J. Imaging 2022, 8(11), 294; https://doi.org/10.3390/jimaging8110294 - 22 Oct 2022

Cited by 2 | Viewed by 2383

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, [...] Read more.

Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

12 pages, 4018 KiB

Open AccessArticle

Face Anti-Spoofing Method Based on Residual Network with Channel Attention Mechanism

by Yueping Kong, Xinyuan Li, Guangye Hao and Chu Liu

Electronics 2022, 11(19), 3056; https://doi.org/10.3390/electronics11193056 - 25 Sep 2022

Cited by 7 | Viewed by 2128

The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel [...] Read more.

The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel face anti-spoofing method combines the residual network and the channel attention mechanism. In our method, the residual network extracts the texture differences of features between face images. In contrast, the attention mechanism focuses on the differences of shadow and edge features located on nasal and cheek areas between living and non-living face images. It can assign weights to different filter features of the face image and enhance the ability of network extraction and expression of different key features in the nasal and cheek regions, improving detection accuracy. The experiments were performed on the public face anti-spoofing datasets of Replay-Attack and CASIA-FASD. We found the best value of the parameter r suitable for face anti-spoofing research is 16, and the accuracy of the method is 99.98% and 97.75%, respectively. Furthermore, to enhance the robustness of the method to illumination changes, the experiment was also performed on the datasets with light changes and achieved a good result. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-34

Submit your Manuscript

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400	Submit
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400	Submit
Machine Learning and Knowledge Extraction make	3.9	8.5	2019	19.9 Days	CHF 1800	Submit
Journal of Imaging jimaging	3.2	4.4	2015	21.7 Days	CHF 1800	Submit
Sensors sensors	3.9	6.8	2001	17 Days	CHF 2600	Submit

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Applied Sciences applsci	2.7	4.5	2011	16.9 Days	CHF 2400	Submit
Electronics electronics	2.9	4.7	2012	15.6 Days	CHF 2400	Submit
Machine Learning and Knowledge Extraction make	3.9	8.5	2019	19.9 Days	CHF 1800	Submit
Journal of Imaging jimaging	3.2	4.4	2015	21.7 Days	CHF 1800	Submit
Sensors sensors	3.9	6.8	2001	17 Days	CHF 2600	Submit

Back to TopTop