Topic Editors

Instituto de Investigación en Informática de Albacete, Universidad de Castilla-La Mancha, 02071 Albacete, Spain
Department of IT Engineering, Sookmyung Women’s University, Seoul 04310, Republic of Korea

Applied Computer Vision and Pattern Recognition: 2nd Volume

Abstract submission deadline
closed (30 June 2023)
Manuscript submission deadline
closed (30 September 2023)
Viewed by
19290

Topic Information

Dear Colleagues,

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Computer vision tasks include methods for acquiring digital images (through image sensors), image processing, and image analysis to reach an understanding of digital images. In general, it deals with the extraction of high-dimensional data from the real world in order to produce numerical or symbolic information that a computer can interpret. For interpretation, computer vision is closely related to pattern recognition.

Indeed, pattern recognition is the process of recognizing patterns by using machine learning algorithms. Pattern recognition can be defined as the identification and classification of meaningful patterns of data based on the extraction and comparison of characteristic properties or features of the data. Pattern recognition is a very important area of research and application, underpinning developments in related fields, such as computer vision, image processing, text and document analysis, and neural networks. It is closely related to machine learning and finds applications in rapidly emerging areas, such as biometrics, bioinformatics, multimedia data analysis, and, more recently, data science. Nowdays, a data-driven approach (such as deep learning) is popular to achieve the goal of pattern recognition and classification in many applications.

This Topic, on Applied Computer Vision and Pattern Recognition, invites papers on theoretical and applied issues, including, but not limited to, the following areas:

  • Statistical, structural, and syntactic pattern recognition;
  • Neural networks, machine learning, and deep learning;
  • Computer vision, robot vision, and machine vision;
  • Multimedia systems and multimedia content;
  • Biosignal processing, speech processing, image processing, and video processing;
  • Data mining, information retrieval, big data, and business intelligence.

This Topic will present the results of research describing recent advances in both the computer vision and pattern recognition fields.

Prof. Dr. Antonio Fernández-Caballero
Prof. Dr. Byung-Gyu Kim
Topic Editors

Keywords

  • pattern recognition
  • neural networks, machine learning
  • deep learning, artificial intelligence
  • computer vision
  • multimedia
  • data mining
  • signal processing
  • image processing

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.7 4.5 2011 15.8 Days CHF 2300
Electronics
electronics
2.9 4.7 2012 15.8 Days CHF 2200
Machine Learning and Knowledge Extraction
make
3.9 8.5 2019 19.2 Days CHF 1400
Journal of Imaging
jimaging
3.2 4.4 2015 21.9 Days CHF 1600
Sensors
sensors
3.9 6.8 2001 16.4 Days CHF 2600

Preprints is a platform dedicated to making early versions of research outputs permanently available and citable. MDPI journals allow posting on preprint servers such as Preprints.org prior to publication. For more details about reprints, please visit https://www.preprints.org.

Published Papers (22 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
Article
Few-Shot Air Object Detection Network
Electronics 2023, 12(19), 4133; https://doi.org/10.3390/electronics12194133 - 04 Oct 2023
Viewed by 108
Abstract
Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a [...] Read more.
Focusing on the problem of low detection precision caused by the few-shot and multi-scale characteristics of air objects, we propose a few-shot air object detection network (FADNet). We first use a transformer as the backbone network of the model and then build a multi-scale attention mechanism (MAM) to deeply fuse the W- and H-dimension features extracted from the channel dimension and the local and global features extracted from the spatial dimension with the object features to improve the network’s performance when detecting air objects. Second, the neck network is innovated based on the path aggregation network (PANet), resulting in an improved path aggregation network (IPANet). Our proposed network reduces the information lost during feature transfer by introducing a jump connection, utilizes sparse connection convolution, strengthens feature extraction abilities at all scales, and improves the discriminative properties of air object features at all scales. Finally, we propose a multi-scale regional proposal network (MRPN) that can establish multiple RPNs based on the scale types of the output features, utilizing adaptive convolutions to effectively extract object features at each scale and enhancing the ability to process multi-scale information. The experimental results showed that our proposed method exhibits good performance and generalization, especially in the 1-, 2-, 3-, 5-, and 10-shot experiments, with average accuracies of 33.2%, 36.8%, 43.3%, 47.2%, and 60.4%, respectively. The FADNet solves the problems posed by the few-shot characteristics and multi-scale characteristics of air objects, as well as improving the detection capabilities of the air object detection model. Full article
Show Figures

Figure 1

Article
Dual Histogram Equalization Algorithm Based on Adaptive Image Correction
Appl. Sci. 2023, 13(19), 10649; https://doi.org/10.3390/app131910649 - 25 Sep 2023
Viewed by 170
Abstract
For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization [...] Read more.
For the visual measurement of moving arm holes in complex working conditions, a histogram equalization algorithm can be used to improve image contrast. To lessen the problems of image brightness shift, image over-enhancement, and gray-level merging that occur with the traditional histogram equalization algorithm, a dual histogram equalization algorithm based on adaptive image correction (AICHE) is proposed. To prevent luminance shifts from occurring during image equalization, the AICHE algorithm protects the average luminance of the input image by improving upon the Otsu algorithm, enabling it to split the histogram. Then, the AICHE algorithm uses the local grayscale correction algorithm to correct the grayscale to prevent the image over-enhancement and gray-level merging problems that arise with the traditional algorithm. It is experimentally verified that the AICHE algorithm can significantly improve the histogram segmentation effect and enhance the contrast and detail information while protecting the average brightness of the input image, and thus the image quality is significantly increased. Full article
Show Figures

Figure 1

Article
Saliency-Driven Hand Gesture Recognition Incorporating Histogram of Oriented Gradients (HOG) and Deep Learning
Sensors 2023, 23(18), 7790; https://doi.org/10.3390/s23187790 - 11 Sep 2023
Viewed by 346
Abstract
Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color [...] Read more.
Hand gesture recognition is a vital means of communication to convey information between humans and machines. We propose a novel model for hand gesture recognition based on computer vision methods and compare results based on images with complex scenes. While extracting skin color information is an efficient method to determine hand regions, complicated image backgrounds adversely affect recognizing the exact area of the hand shape. Some valuable features like saliency maps, histogram of oriented gradients (HOG), Canny edge detection, and skin color help us maximize the accuracy of hand shape recognition. Considering these features, we proposed an efficient hand posture detection model that improves the test accuracy results to over 99% on the NUS Hand Posture Dataset II and more than 97% on the hand gesture dataset with different challenging backgrounds. In addition, we added noise to around 60% of our datasets. Replicating our experiment, we achieved more than 98% and nearly 97% accuracy on NUS and hand gesture datasets, respectively. Experiments illustrate that the saliency method with HOG has stable performance for a wide range of images with complex backgrounds having varied hand colors and sizes. Full article
Show Figures

Figure 1

Article
Detection of Wheat Yellow Rust Disease Severity Based on Improved GhostNetV2
Appl. Sci. 2023, 13(17), 9987; https://doi.org/10.3390/app13179987 - 04 Sep 2023
Viewed by 365
Abstract
Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and [...] Read more.
Wheat production safety is facing serious challenges because wheat yellow rust is a worldwide disease. Wheat yellow rust may have no obvious external manifestations in the early stage, and it is difficult to detect whether it is infected, but in the middle and late stages of onset, the symptoms of the disease are obvious, though the severity is difficult to distinguish. A traditional deep learning network model has a large number of parameters, a large amount of calculation, a long time for model training, and high resource consumption, making it difficult to transplant to mobile and edge terminals. To address the above issues, this study proposes an optimized GhostNetV2 approach. First, to increase communication between groups, a channel rearrangement operation is performed on the output of the Ghost module. Then, the first five G-bneck layers of the source model GhostNetV2 are replaced with Fused-MBConv to accelerate model training. Finally, to further improve the model’s identification of diseases, the source attention mechanism SE is replaced by ECA. After experimental comparison, the improved algorithm shortens the training time by 37.49%, and the accuracy rate reaches 95.44%, which is 2.24% higher than the GhostNetV2 algorithm. The detection accuracy and speed have major improvements compared with other lightweight model algorithms. Full article
Show Figures

Figure 1

Article
A Long Skip Connection for Enhanced Color Selectivity in CNN Architectures
Sensors 2023, 23(17), 7582; https://doi.org/10.3390/s23177582 - 31 Aug 2023
Viewed by 319
Abstract
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation [...] Read more.
Some recent studies show that filters in convolutional neural networks (CNNs) have low color selectivity in datasets of natural scenes such as Imagenet. CNNs, bio-inspired by the visual cortex, are characterized by their hierarchical learning structure which appears to gradually transform the representation space. Inspired by the direct connection between the LGN and V4, which allows V4 to handle low-level information closer to the trichromatic input in addition to processed information that comes from V2/V3, we propose the addition of a long skip connection (LSC) between the first and last blocks of the feature extraction stage to allow deeper parts of the network to receive information from shallower layers. This type of connection improves classification accuracy by combining simple-visual and complex-abstract features to create more color-selective ones. We have applied this strategy to classic CNN architectures and quantitatively and qualitatively analyzed the improvement in accuracy while focusing on color selectivity. The results show that, in general, skip connections improve accuracy, but LSC improves it even more and enhances the color selectivity of the original CNN architectures. As a side result, we propose a new color representation procedure for organizing and filtering feature maps, making their visualization more manageable for qualitative color selectivity analysis. Full article
Show Figures

Figure 1

Article
MCMNET: Multi-Scale Context Modeling Network for Temporal Action Detection
Sensors 2023, 23(17), 7563; https://doi.org/10.3390/s23177563 - 31 Aug 2023
Viewed by 354
Abstract
Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it [...] Read more.
Temporal action detection is a very important and challenging task in the field of video understanding, especially for datasets with significant differences in action duration. The temporal relationships between the action instances contained in these datasets are very complex. For such videos, it is necessary to capture information with a richer temporal distribution as much as possible. In this paper, we propose a dual-stream model that can model contextual information at multiple temporal scales. First, the input video is divided into two resolution streams, followed by a Multi-Resolution Context Aggregation module to capture multi-scale temporal information. Additionally, an Information Enhancement module is added after the high-resolution input stream to model both long-range and short-range contexts. Finally, the outputs of the two modules are merged to obtain features with rich temporal information for action localization and classification. We conducted experiments on three datasets to evaluate the proposed approach. On ActivityNet-v1.3, an average mAP (mean Average Precision) of 32.83% was obtained. On Charades, the best performance was obtained, with an average mAP of 27.3%. On TSU (Toyota Smarthome Untrimmed), an average mAP of 33.1% was achieved. Full article
Show Figures

Figure 1

Article
Infrared Dim and Small Target Sequence Dataset Generation Method Based on Generative Adversarial Networks
Electronics 2023, 12(17), 3625; https://doi.org/10.3390/electronics12173625 - 28 Aug 2023
Viewed by 411
Abstract
With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper [...] Read more.
With the development of infrared technology, infrared dim and small target detection plays a vital role in precision guidance applications. To address the problems of insufficient dataset coverage and huge actual shooting costs in infrared dim and small target detection methods, this paper proposes a method for generating infrared dim and small target sequence datasets based on generative adversarial networks (GANs). Specifically, first, the improved deep convolutional generative adversarial network (DCGAN) model is used to generate clear images of the infrared sky background. Then, target–background sequence images are constructed using multi-scale feature extraction and improved conditional generative adversarial networks. This method fully considers the infrared characteristics of the target and the background, which can achieve effective expansion of the image data and provide a test set for the infrared small target detection and recognition algorithm. In addition, the classifier’s performance can be improved by expanding the training set, which enhances the accuracy and effect of infrared dim and small target detection based on deep learning. After experimental evaluation, the dataset generated by this method is similar to the real infrared dataset, and the model detection accuracy can be improved after training with the latest deep learning model. Full article
Show Figures

Figure 1

Article
Unification of Road Scene Segmentation Strategies Using Multistream Data and Latent Space Attention
Sensors 2023, 23(17), 7355; https://doi.org/10.3390/s23177355 - 23 Aug 2023
Viewed by 422
Abstract
Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity [...] Read more.
Road scene understanding, as a field of research, has attracted increasing attention in recent years. The development of road scene understanding capabilities that are applicable to real-world road scenarios has seen numerous complications. This has largely been due to the cost and complexity of achieving human-level scene understanding, at which successful segmentation of road scene elements can be achieved with a mean intersection over union score close to 1.0. There is a need for more of a unified approach to road scene segmentation for use in self-driving systems. Previous works have demonstrated how deep learning methods can be combined to improve the segmentation and perception performance of road scene understanding systems. This paper proposes a novel segmentation system that uses fully connected networks, attention mechanisms, and multiple-input data stream fusion to improve segmentation performance. Results show comparable performance compared to previous works, with a mean intersection over union of 87.4% on the Cityscapes dataset. Full article
Show Figures

Figure 1

Article
Vision Transformer Customized for Environment Detection and Collision Prediction to Assist the Visually Impaired
J. Imaging 2023, 9(8), 161; https://doi.org/10.3390/jimaging9080161 - 15 Aug 2023
Viewed by 540
Abstract
This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the [...] Read more.
This paper presents a system that utilizes vision transformers and multimodal feedback modules to facilitate navigation and collision avoidance for the visually impaired. By implementing vision transformers, the system achieves accurate object detection, enabling the real-time identification of objects in front of the user. Semantic segmentation and the algorithms developed in this work provide a means to generate a trajectory vector of all identified objects from the vision transformer and to detect objects that are likely to intersect with the user’s walking path. Audio and vibrotactile feedback modules are integrated to convey collision warning through multimodal feedback. The dataset used to create the model was captured from both indoor and outdoor settings under different weather conditions at different times across multiple days, resulting in 27,867 photos consisting of 24 different classes. Classification results showed good performance (95% accuracy), supporting the efficacy and reliability of the proposed model. The design and control methods of the multimodal feedback modules for collision warning are also presented, while the experimental validation concerning their usability and efficiency stands as an upcoming endeavor. The demonstrated performance of the vision transformer and the presented algorithms in conjunction with the multimodal feedback modules show promising prospects of its feasibility and applicability for the navigation assistance of individuals with vision impairment. Full article
Show Figures

Figure 1

Article
Center Deviation Measurement of Color Contact Lenses Based on a Deep Learning Model and Hough Circle Transform
Sensors 2023, 23(14), 6533; https://doi.org/10.3390/s23146533 - 19 Jul 2023
Viewed by 528
Abstract
Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. [...] Read more.
Ensuring the quality of color contact lenses is vital, particularly in detecting defects during their production since they are directly worn on the eyes. One significant defect is the “center deviation (CD) defect”, where the colored area (CA) deviates from the center point. Measuring the extent of deviation of the CA from the center point is necessary to detect these CD defects. In this study, we propose a method that utilizes image processing and analysis techniques for detecting such defects. Our approach involves employing semantic segmentation to simplify the image and reduce noise interference and utilizing the Hough circle transform algorithm to measure the deviation of the center point of the CA in color contact lenses. Experimental results demonstrated that our proposed method achieved a 71.2% reduction in error compared with existing research methods. Full article
Show Figures

Figure 1

Article
Improving Small-Scale Human Action Recognition Performance Using a 3D Heatmap Volume
Sensors 2023, 23(14), 6364; https://doi.org/10.3390/s23146364 - 13 Jul 2023
Viewed by 517
Abstract
In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning [...] Read more.
In recent years, skeleton-based human action recognition has garnered significant research attention, with proposed recognition or segmentation methods typically validated on large-scale coarse-grained action datasets. However, there remains a lack of research on the recognition of small-scale fine-grained human actions using deep learning methods, which have greater practical significance. To address this gap, we propose a novel approach based on heatmap-based pseudo videos and a unified, general model applicable to all modality datasets. Leveraging anthropometric kinematics as prior information, we extract common human motion features among datasets through an ad hoc pre-trained model. To overcome joint mismatch issues, we partition the human skeleton into five parts, a simple yet effective technique for information sharing. Our approach is evaluated on two datasets, including the public Nursing Activities and our self-built Tai Chi Action dataset. Results from linear evaluation protocol and fine-tuned evaluation demonstrate that our pre-trained model effectively captures common motion features among human actions and achieves steady and precise accuracy across all training settings, while mitigating network overfitting. Notably, our model outperforms state-of-the-art models in recognition accuracy when fusing joint and limb modality features along the channel dimension. Full article
Show Figures

Figure 1

Article
Unsupervised Vehicle Re-Identification Based on Cross-Style Semi-Supervised Pre-Training and Feature Cross-Division
Electronics 2023, 12(13), 2931; https://doi.org/10.3390/electronics12132931 - 03 Jul 2023
Viewed by 389
Abstract
Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting [...] Read more.
Vehicle Re-Identification (Re-ID) based on Unsupervised Domain Adaptation (UDA) has shown promising performance. However, two main issues still exist: (1) existing methods that use Generative Adversarial Networks (GANs) for domain gap alleviation combine supervised learning with hard labels of the source domain, resulting in a mismatch between style transfer data and hard labels; (2) pseudo label assignment in the fine-tuning stage is solely determined by similarity measures of global features using clustering algorithms, leading to inevitable label noise in generated pseudo labels. To tackle these issues, this paper proposes an unsupervised vehicle re-identification framework based on cross-style semi-supervised pre-training and feature cross-division. The framework consists of two parts: cross-style semi-supervised pre-training (CSP) and feature cross-division (FCD) for model fine-tuning. The CSP module generates style transfer data containing source domain content and target domain style using a style transfer network, and then pre-trains the model in a semi-supervised manner using both source domain and style transfer data. A pseudo-label reassignment strategy is designed to generate soft labels assigned to the style transfer data. The FCD module obtains feature partitions through a novel interactive division to reduce the dependence of pseudo-labels on global features, and the final similarity measurement combines the results of partition features and global features. Experimental results on the VehicleID and VeRi-776 datasets show that the proposed method outperforms existing unsupervised vehicle re-identification methods. Compared with the last best method on each dataset, the method proposed in this paper improves the mAP by 0.63% and the Rank-1 by 0.73% on the three sub-datasets of VehicleID on average, and it improves mAP by 0.9% and Rank-1 by 1% on VeRi-776 dataset. Full article
Show Figures

Figure 1

Article
Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model
Sensors 2023, 23(11), 5232; https://doi.org/10.3390/s23115232 - 31 May 2023
Cited by 1 | Viewed by 1368
Abstract
Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is [...] Read more.
Sentiment is currently one of the most emerging areas of research due to the large amount of web content coming from social networking websites. Sentiment analysis is a crucial process for recommending systems for most people. Generally, the purpose of sentiment analysis is to determine an author’s attitude toward a subject or the overall tone of a document. There is a huge collection of studies that make an effort to predict how useful online reviews will be and have produced conflicting results on the efficacy of different methodologies. Furthermore, many of the current solutions employ manual feature generation and conventional shallow learning methods, which restrict generalization. As a result, the goal of this research is to develop a general approach using transfer learning by applying the “BERT (Bidirectional Encoder Representations from Transformers)”-based model. The efficiency of BERT classification is then evaluated by comparing it with similar machine learning techniques. In the experimental evaluation, the proposed model demonstrated superior performance in terms of outstanding prediction and high accuracy compared to earlier research. Comparative tests conducted on positive and negative Yelp reviews reveal that fine-tuned BERT classification performs better than other approaches. In addition, it is observed that BERT classifiers using batch size and sequence length significantly affect classification performance. Full article
Show Figures

Figure 1

Article
Lightweight Multiscale CNN Model for Wheat Disease Detection
Appl. Sci. 2023, 13(9), 5801; https://doi.org/10.3390/app13095801 - 08 May 2023
Cited by 1 | Viewed by 1182
Abstract
Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models [...] Read more.
Wheat disease detection is crucial for disease diagnosis, pesticide application optimization, disease control, and wheat yield and quality improvement. However, the detection of wheat diseases is difficult due to their various types. Detecting wheat diseases in complex fields is also challenging. Traditional models are difficult to apply to mobile devices because they have large parameters, and high computation and resource requirements. To address these issues, this paper combines the residual module and the inception module to construct a lightweight multiscale CNN model, which introduces the CBAM and ECA modules into the residual block, enhances the model’s attention to diseases, and reduces the influence of complex backgrounds on disease recognition. The proposed method has an accuracy rate of 98.7% on the test dataset, which is higher than classic convolutional neural networks such as AlexNet, VGG16, and InceptionresnetV2 and lightweight models such as MobileNetV3 and EfficientNetb0. The proposed model has superior performance and can be applied to mobile terminals to quickly identify wheat diseases. Full article
Show Figures

Figure 1

Article
Development of an Accurate and Automated Quality Inspection System for Solder Joints on Aviation Plugs Using Fine-Tuned YOLOv5 Models
Appl. Sci. 2023, 13(9), 5290; https://doi.org/10.3390/app13095290 - 23 Apr 2023
Cited by 4 | Viewed by 1038
Abstract
The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an [...] Read more.
The quality inspection of solder joints on aviation plugs is extremely important in modern manufacturing industries. However, this task is still mostly performed by skilled workers after welding operations, posing the problems of subjective judgment and low efficiency. To address these issues, an accurate and automated detection system using fine-tuned YOLOv5 models is developed in this paper. Firstly, we design an intelligent image acquisition system to obtain the high-resolution image of each solder joint automatically. Then, a two-phase approach is proposed for fast and accurate weld quality detection. In the first phase, a fine-tuned YOLOv5 model is applied to extract the region of interest (ROI), i.e., the row of solder joints to be inspected, within the whole image. With the sliding platform, the ROI is automatically moved to the center of the image to enhance its imaging clarity. Subsequently, another fine-tuned YOLOv5 model takes this adjusted ROI as input and realizes quality assessment. Finally, a concise and easy-to-use GUI has been designed and deployed in real production lines. Experimental results in the actual production line show that the proposed method can achieve a detection accuracy of more than 97.5% with a detection speed of about 0.1 s, which meets the needs of actual production Full article
Show Figures

Figure 1

Article
FM-STDNet: High-Speed Detector for Fast-Moving Small Targets Based on Deep First-Order Network Architecture
Electronics 2023, 12(8), 1829; https://doi.org/10.3390/electronics12081829 - 12 Apr 2023
Viewed by 603
Abstract
Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time [...] Read more.
Identifying objects of interest from digital vision signals is a core task of intelligent systems. However, fast and accurate identification of small moving targets in real-time has become a bottleneck in the field of target detection. In this paper, the problem of real-time detection of the fast-moving printed circuit board (PCB) tiny targets is investigated. This task is very challenging because PCB defects are usually small compared to the whole PCB board, and due to the pursuit of production efficiency, the actual production PCB moving speed is usually very fast, which puts higher requirements on the real-time of intelligent systems. To this end, a new model of FM-STDNet (Fast Moving Small Target Detection Network) is proposed based on the well-known deep learning detector YOLO (You Only Look Once) series model. First, based on the SPPNet (Spatial Pyramid Pooling Networks) network, a new SPPFCSP (Spatial Pyramid Pooling Fast Cross Stage Partial Network) spatial pyramid pooling module is designed to adapt to the extraction of different scale size features of different size input images, which helps retain the high semantic information of smaller features; then, the anchor-free mode is introduced to directly classify the regression prediction information and do the structural reparameterization construction to design a new high-speed prediction head RepHead to further improve the operation speed of the detector. The experimental results show that the proposed detector achieves 99.87% detection accuracy at the fastest speed compared to state-of-the-art depth detectors such as YOLOv3, Faster R-CNN, and TDD-Net in the fast-moving PCB surface defect detection task. The new model of FM-STDNet provides an effective reference for the fast-moving small target detection task. Full article
Show Figures

Figure 1

Article
Improved Mask R-CNN Multi-Target Detection and Segmentation for Autonomous Driving in Complex Scenes
Sensors 2023, 23(8), 3853; https://doi.org/10.3390/s23083853 - 10 Apr 2023
Cited by 2 | Viewed by 1553
Abstract
Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in [...] Read more.
Vision-based target detection and segmentation has been an important research content for environment perception in autonomous driving, but the mainstream target detection and segmentation algorithms have the problems of low detection accuracy and poor mask segmentation quality for multi-target detection and segmentation in complex traffic scenes. To address this problem, this paper improved the Mask R-CNN by replacing the backbone network ResNet with the ResNeXt network with group convolution to further improve the feature extraction capability of the model. Furthermore, a bottom-up path enhancement strategy was added to the Feature Pyramid Network (FPN) to achieve feature fusion, while an efficient channel attention module (ECA) was added to the backbone feature extraction network to optimize the high-level low resolution semantic information graph. Finally, the bounding box regression loss function smooth L1 loss was replaced by CIoU loss to speed up the model convergence and minimize the error. The experimental results showed that the improved Mask R-CNN algorithm achieved 62.62% mAP for target detection and 57.58% mAP for segmentation accuracy on the publicly available CityScapes autonomous driving dataset, which were 4.73% and 3.96%% better than the original Mask R-CNN algorithm, respectively. The migration experiments showed that it has good detection and segmentation effects in each traffic scenario of the publicly available BDD autonomous driving dataset. Full article
Show Figures

Figure 1

Article
Insights into Batch Selection for Event-Camera Motion Estimation
Sensors 2023, 23(7), 3699; https://doi.org/10.3390/s23073699 - 03 Apr 2023
Viewed by 923
Abstract
Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras [...] Read more.
Event cameras measure scene changes with high temporal resolutions, making them well-suited for visual motion estimation. The activation of pixels results in an asynchronous stream of digital data (events), which rolls continuously over time without the discrete temporal boundaries typical of frame-based cameras (where a data packet or frame is emitted at a fixed temporal rate). As such, it is not trivial to define a priori how to group/accumulate events in a way that is sufficient for computation. The suitable number of events can greatly vary for different environments, motion patterns, and tasks. In this paper, we use neural networks for rotational motion estimation as a scenario to investigate the appropriate selection of event batches to populate input tensors. Our results show that batch selection has a large impact on the results: training should be performed on a wide variety of different batches, regardless of the batch selection method; a simple fixed-time window is a good choice for inference with respect to fixed-count batches, and it also demonstrates comparable performance to more complex methods. Our initial hypothesis that a minimal amount of events is required to estimate motion (as in contrast maximization) is not valid when estimating motion with a neural network. Full article
Show Figures

Figure 1

Article
Self-Supervised Facial Motion Representation Learning via Contrastive Subclips
Electronics 2023, 12(6), 1369; https://doi.org/10.3390/electronics12061369 - 13 Mar 2023
Viewed by 799
Abstract
Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a [...] Read more.
Facial motion representation learning has become an exciting research topic, since biometric technologies are becoming more common in our daily lives. One of its applications is identity verification. After recording a dynamic facial motion video for enrollment, the user needs to show a matched facial appearance and make a facial motion the same as the enrollment for authentication. Some recent research papers have discussed the benefits of this new biometric technology and reported promising results for both static and dynamic facial motion verification tasks. Our work extends the existing approaches and introduces compound facial actions, which contain more than one dominant facial action in one utterance. We propose a new self-supervised pretraining method called contrastive subclips that improves the model performance with these more complex and secure facial motions. The experimental results show that the contrastive subclips method improves upon the baseline approaches, and the model performance for test data can reach 89.7% average precision. Full article
Show Figures

Figure 1

Review
The Challenges of Recognizing Offline Handwritten Chinese: A Technical Review
Appl. Sci. 2023, 13(6), 3500; https://doi.org/10.3390/app13063500 - 09 Mar 2023
Cited by 2 | Viewed by 1642
Abstract
Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination [...] Read more.
Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination with other domain knowledge, offline handwritten Chinese recognition has gained breakthroughs in methods and performance in recent years. However, there have yet to be articles that provide a technical review of this field since 2016. In light of this, this paper reviews the research progress and challenges of offline handwritten Chinese recognition based on traditional techniques, deep learning methods, methods combining deep learning with traditional techniques, and knowledge from other areas from 2016 to 2022. Firstly, it introduces the research background and status of handwritten Chinese recognition, standard datasets, and evaluation metrics. Secondly, a comprehensive summary and analysis of offline HCCR and offline HCTR approaches during the last seven years is provided, along with an explanation of their concepts, specifics, and performances. Finally, the main research problems in this field over the past few years are presented. The challenges still exist in offline handwritten Chinese recognition are discussed, aiming to inspire future research work. Full article
Show Figures

Figure 1

Article
Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images
J. Imaging 2022, 8(11), 294; https://doi.org/10.3390/jimaging8110294 - 22 Oct 2022
Cited by 2 | Viewed by 1715
Abstract
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, [...] Read more.
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object. In contrast to the previous image-captioning research, generating captions from the geological images of rocks is more focused on the background of the images. This study proposed image captioning using a convolutional neural network, long short-term memory, and word2vec to generate words from the image. The proposed model was constructed by a convolutional neural network (CNN), long short-term memory (LSTM), and word2vec and gave a dense output of 256 units. To make it properly grammatical, a sequence of predicted words was reconstructed into a sentence by the beam search algorithm with K = 3. An evaluation of the pre-trained baseline model VGG16 and our proposed CNN-A, CNN-B, CNN-C, and CNN-D models used BLEU score methods for the N-gram. The BLEU scores achieved for BLEU-1 using these models were 0.5515, 0.6463, 0.7012, 0.7620, and 0.5620, respectively. BLEU-2 showed scores of 0.6048, 0.6507, 0.7083, 0.8756, and 0.6578, respectively. BLEU-3 performed with scores of 0.6414, 0.6892, 0.7312, 0.8861, and 0.7307, respectively. Finally, BLEU-4 had scores of 0.6526, 0.6504, 0.7345, 0.8250, and 0.7537, respectively. Our CNN-C model outperformed the other models, especially the baseline model. Furthermore, there are several future challenges in studying captions, such as geological sentence structure, geological sentence phrase, and constructing words by a geological tagger. Full article
Show Figures

Figure 1

Article
Face Anti-Spoofing Method Based on Residual Network with Channel Attention Mechanism
Electronics 2022, 11(19), 3056; https://doi.org/10.3390/electronics11193056 - 25 Sep 2022
Cited by 2 | Viewed by 1569
Abstract
The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel [...] Read more.
The face recognition system is vulnerable to spoofing attacks by photos or videos of a valid user face. However, edge degradation and texture blurring occur when non-living face images are used to attack the face recognition system. With this in mind, a novel face anti-spoofing method combines the residual network and the channel attention mechanism. In our method, the residual network extracts the texture differences of features between face images. In contrast, the attention mechanism focuses on the differences of shadow and edge features located on nasal and cheek areas between living and non-living face images. It can assign weights to different filter features of the face image and enhance the ability of network extraction and expression of different key features in the nasal and cheek regions, improving detection accuracy. The experiments were performed on the public face anti-spoofing datasets of Replay-Attack and CASIA-FASD. We found the best value of the parameter r suitable for face anti-spoofing research is 16, and the accuracy of the method is 99.98% and 97.75%, respectively. Furthermore, to enhance the robustness of the method to illumination changes, the experiment was also performed on the datasets with light changes and achieved a good result. Full article
Show Figures

Figure 1

Back to TopTop