sensors-logo

Journal Browser

Journal Browser

Deep Learning for Information Fusion and Pattern Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (10 March 2024) | Viewed by 22292

Special Issue Editors


E-Mail Website
Guest Editor
Department of Data Science, University of Mississippi Medical Center, Jackson, MS, USA
Interests: image processing; deep learning; computer vision; computer-aided detection/diagnosis

E-Mail Website
Guest Editor
Air Force Office of Scientific Research, Arlington, VA 22203-1768, USA
Interests: information fusion; space-aware tracking; industrial avionics; human factors
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

There are a large amount of data from different types of sensors, for instance, multispectral Electro-Optical/Infrared (EO/IR) and computed tomography/magnetic resonance (CT/MR) images, among others. How to take advantage of multimodal data for object detection and pattern recognition is an active field of research. Information fusion (IF) is a venue to enhance the performance of pattern classification, and Deep learning (DL) technologies, including convolutional neural networks (CNNs), are powerful tools to improve object detection, segmentation, and recognition. It is viable to combine DL and IF to boost the overall performance of pattern classification and target recognition. Such combinations of powerful techniques may exploit the deeply hidden features from the multimodal, spatial or temporal data. Example applications may include (but are not limited to) face recognition, cancer detection, image fusion, object detection, and target recognition.

Dr. Yufeng Zheng
Dr. Erik Blasch
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • sensor fusion
  • deep learning
  • convolutional neural networks
  • pattern recognition
  • object detection

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 3791 KiB  
Article
SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection
by Yanbin Peng, Zhinian Zhai and Mingkun Feng
Sensors 2024, 24(4), 1117; https://doi.org/10.3390/s24041117 - 08 Feb 2024
Viewed by 586
Abstract
Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features [...] Read more.
Salient Object Detection (SOD) in RGB-D images plays a crucial role in the field of computer vision, with its central aim being to identify and segment the most visually striking objects within a scene. However, optimizing the fusion of multi-modal and multi-scale features to enhance detection performance remains a challenge. To address this issue, we propose a network model based on semantic localization and multi-scale fusion (SLMSF-Net), specifically designed for RGB-D SOD. Firstly, we designed a Deep Attention Module (DAM), which extracts valuable depth feature information from both channel and spatial perspectives and efficiently merges it with RGB features. Subsequently, a Semantic Localization Module (SLM) is introduced to enhance the top-level modality fusion features, enabling the precise localization of salient objects. Finally, a Multi-Scale Fusion Module (MSF) is employed to perform inverse decoding on the modality fusion features, thus restoring the detailed information of the objects and generating high-precision saliency maps. Our approach has been validated across six RGB-D salient object detection datasets. The experimental results indicate an improvement of 0.20~1.80%, 0.09~1.46%, 0.19~1.05%, and 0.0002~0.0062, respectively in maxF, maxE, S, and MAE metrics, compared to the best competing methods (AFNet, DCMF, and C2DFNet). Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

24 pages, 6587 KiB  
Article
Remote Photoplethysmography and Motion Tracking Convolutional Neural Network with Bidirectional Long Short-Term Memory: Non-Invasive Fatigue Detection Method Based on Multi-Modal Fusion
by Lingjian Kong, Kai Xie, Kaixuan Niu, Jianbiao He and Wei Zhang
Sensors 2024, 24(2), 455; https://doi.org/10.3390/s24020455 - 11 Jan 2024
Cited by 2 | Viewed by 915
Abstract
Existing vision-based fatigue detection methods commonly utilize RGB cameras to extract facial and physiological features for monitoring driver fatigue. These features often include single indicators such as eyelid movement, yawning frequency, and heart rate. However, the accuracy of RGB cameras can be affected [...] Read more.
Existing vision-based fatigue detection methods commonly utilize RGB cameras to extract facial and physiological features for monitoring driver fatigue. These features often include single indicators such as eyelid movement, yawning frequency, and heart rate. However, the accuracy of RGB cameras can be affected by factors like varying lighting conditions and motion. To address these challenges, we propose a non-invasive method for multi-modal fusion fatigue detection called RPPMT-CNN-BiLSTM. This method incorporates a feature extraction enhancement module based on the improved Pan–Tompkins algorithm and 1D-MTCNN. This enhances the accuracy of heart rate signal extraction and eyelid features. Furthermore, we use one-dimensional neural networks to construct two models based on heart rate and PERCLOS values, forming a fatigue detection model. To enhance the robustness and accuracy of fatigue detection, the trained model data results are input into the BiLSTM network. This generates a time-fitting relationship between the data extracted from the CNN, allowing for effective dynamic modeling and achieving multi-modal fusion fatigue detection. Numerous experiments validate the effectiveness of the proposed method, achieving an accuracy of 98.2% on the self-made MDAD (Multi-Modal Driver Alertness Dataset). This underscores the feasibility of the algorithm. In comparison with traditional methods, our approach demonstrates higher accuracy and positively contributes to maintaining traffic safety, thereby advancing the field of smart transportation. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

21 pages, 6143 KiB  
Article
Smart Shelf System for Customer Behavior Tracking in Supermarkets
by John Anthony C. Jose, Christopher John B. Bertumen, Marianne Therese C. Roque, Allan Emmanuel B. Umali, Jillian Clara T. Villanueva, Richard Josiah TanAi, Edwin Sybingco, Jayne San Juan and Erwin Carlo Gonzales
Sensors 2024, 24(2), 367; https://doi.org/10.3390/s24020367 - 08 Jan 2024
Viewed by 1473
Abstract
Transactional data from point-of-sales systems may not consider customer behavior before purchasing decisions are finalized. A smart shelf system would be able to provide additional data for retail analytics. In previous works, the conventional approach has involved customers standing directly in front of [...] Read more.
Transactional data from point-of-sales systems may not consider customer behavior before purchasing decisions are finalized. A smart shelf system would be able to provide additional data for retail analytics. In previous works, the conventional approach has involved customers standing directly in front of products on a shelf. Data from instances where customers deviated from this convention, referred to as “cross-location”, were typically omitted. However, recognizing instances of cross-location is crucial when contextualizing multi-person and multi-product tracking for real-world scenarios. The monitoring of product association with customer keypoints through RANSAC modeling and particle filtering (PACK-RMPF) is a system that addresses cross-location, consisting of twelve load cell pairs for product tracking and a single camera for customer tracking. In this study, the time series vision data underwent further processing with R-CNN and StrongSORT. An NTP server enabled the synchronization of timestamps between the weight and vision subsystems. Multiple particle filtering predicted the trajectory of each customer’s centroid and wrist keypoints relative to the location of each product. RANSAC modeling was implemented on the particles to associate a customer with each event. Comparing system-generated customer–product interaction history with the shopping lists given to each participant, the system had a general average recall rate of 76.33% and 79% for cross-location instances over five runs. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

17 pages, 4569 KiB  
Article
Exponential Fusion of Interpolated Frames Network (EFIF-Net): Advancing Multi-Frame Image Super-Resolution with Convolutional Neural Networks
by Hamed Elwarfalli, Dylan Flaute and Russell C. Hardie
Sensors 2024, 24(1), 296; https://doi.org/10.3390/s24010296 - 04 Jan 2024
Viewed by 1018
Abstract
Convolutional neural networks (CNNs) have become instrumental in advancing multi-frame image super-resolution (SR), a technique that merges multiple low-resolution images of the same scene into a high-resolution image. In this paper, a novel deep learning multi-frame SR algorithm is introduced. The proposed CNN [...] Read more.
Convolutional neural networks (CNNs) have become instrumental in advancing multi-frame image super-resolution (SR), a technique that merges multiple low-resolution images of the same scene into a high-resolution image. In this paper, a novel deep learning multi-frame SR algorithm is introduced. The proposed CNN model, named Exponential Fusion of Interpolated Frames Network (EFIF-Net), seamlessly integrates fusion and restoration within an end-to-end network. Key features of the new EFIF-Net include a custom exponentially weighted fusion (EWF) layer for image fusion and a modification of the Residual Channel Attention Network for restoration to deblur the fused image. Input frames are registered with subpixel accuracy using an affine motion model to capture the camera platform motion. The frames are externally upsampled using single-image interpolation. The interpolated frames are then fused with the custom EWF layer, employing subpixel registration information to give more weight to pixels with less interpolation error. Realistic image acquisition conditions are simulated to generate training and testing datasets with corresponding ground truths. The observation model captures optical degradation from diffraction and detector integration from the sensor. The experimental results demonstrate the efficacy of EFIF-Net using both simulated and real camera data. The real camera results use authentic, unaltered camera data without artificial downsampling or degradation. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

13 pages, 389 KiB  
Article
Deep Learning for Combating Misinformation in Multicategorical Text Contents
by Rafał Kozik, Wojciech Mazurczyk, Krzysztof Cabaj, Aleksandra Pawlicka, Marek Pawlicki and Michał Choraś
Sensors 2023, 23(24), 9666; https://doi.org/10.3390/s23249666 - 07 Dec 2023
Cited by 2 | Viewed by 1002
Abstract
Currently, one can observe the evolution of social media networks. In particular, humans are faced with the fact that, often, the opinion of an expert is as important and significant as the opinion of a non-expert. It is possible to observe changes and [...] Read more.
Currently, one can observe the evolution of social media networks. In particular, humans are faced with the fact that, often, the opinion of an expert is as important and significant as the opinion of a non-expert. It is possible to observe changes and processes in traditional media that reduce the role of a conventional ‘editorial office’, placing gradual emphasis on the remote work of journalists and forcing increasingly frequent use of online sources rather than actual reporting work. As a result, social media has become an element of state security, as disinformation and fake news produced by malicious actors can manipulate readers, creating unnecessary debate on topics organically irrelevant to society. This causes a cascading effect, fear of citizens, and eventually threats to the state’s security. Advanced data sensors and deep machine learning methods have great potential to enable the creation of effective tools for combating the fake news problem. However, these solutions often need better model generalization in the real world due to data deficits. In this paper, we propose an innovative solution involving a committee of classifiers in order to tackle the fake news detection challenge. In that regard, we introduce a diverse set of base models, each independently trained on sub-corpora with unique characteristics. In particular, we use multi-label text category classification, which helps formulate an ensemble. The experiments were conducted on six different benchmark datasets. The results are promising and open the field for further research. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

21 pages, 2761 KiB  
Article
Deep Learning-Based Child Handwritten Arabic Character Recognition and Handwriting Discrimination
by Maram Saleh Alwagdani and Emad Sami Jaha
Sensors 2023, 23(15), 6774; https://doi.org/10.3390/s23156774 - 28 Jul 2023
Cited by 1 | Viewed by 1734
Abstract
Handwritten Arabic character recognition has received increasing research interest in recent years. However, as of yet, the majority of the existing handwriting recognition systems have only focused on adult handwriting. In contrast, there have not been many studies conducted on child handwriting, nor [...] Read more.
Handwritten Arabic character recognition has received increasing research interest in recent years. However, as of yet, the majority of the existing handwriting recognition systems have only focused on adult handwriting. In contrast, there have not been many studies conducted on child handwriting, nor has it been regarded as a major research issue yet. Compared to adults’ handwriting, children’s handwriting is more challenging since it often has lower quality, higher variation, and larger distortions. Furthermore, most of these designed and currently used systems for adult data have not been trained or tested for child data recognition purposes or applications. This paper presents a new convolution neural network (CNN) model for recognizing children’s handwritten isolated Arabic letters. Several experiments are conducted here to investigate and analyze the influence when training the model with different datasets of children, adults, and both to measure and compare performance in recognizing children’s handwritten characters and discriminating their handwriting from adult handwriting. In addition, a number of supplementary features are proposed based on empirical study and observations and are combined with CNN-extracted features to augment the child and adult writer-group classification. Lastly, the performance of the extracted deep and supplementary features is evaluated and compared using different classifiers, comprising Softmax, support vector machine (SVM), k-nearest neighbor (KNN), and random forest (RF), as well as different dataset combinations from Hijja for child data and AHCD for adult data. Our findings highlight that the training strategy is crucial, and the inclusion of adult data is influential in achieving an increased accuracy of up to around 93% in child handwritten character recognition. Moreover, the fusion of the proposed supplementary features with the deep features attains an improved performance in child handwriting discrimination by up to around 94%. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

17 pages, 1567 KiB  
Article
Facial Micro-Expression Recognition Enhanced by Score Fusion and a Hybrid Model from Convolutional LSTM and Vision Transformer
by Yufeng Zheng and Erik Blasch
Sensors 2023, 23(12), 5650; https://doi.org/10.3390/s23125650 - 16 Jun 2023
Cited by 2 | Viewed by 1588
Abstract
In the billions of faces that are shaped by thousands of different cultures and ethnicities, one thing remains universal: the way emotions are expressed. To take the next step in human–machine interactions, a machine (e.g., a humanoid robot) must be able to clarify [...] Read more.
In the billions of faces that are shaped by thousands of different cultures and ethnicities, one thing remains universal: the way emotions are expressed. To take the next step in human–machine interactions, a machine (e.g., a humanoid robot) must be able to clarify facial emotions. Allowing systems to recognize micro-expressions affords the machine a deeper dive into a person’s true feelings, which will take human emotion into account while making optimal decisions. For instance, these machines will be able to detect dangerous situations, alert caregivers to challenges, and provide appropriate responses. Micro-expressions are involuntary and transient facial expressions capable of revealing genuine emotions. We propose a new hybrid neural network (NN) model capable of micro-expression recognition in real-time applications. Several NN models are first compared in this study. Then, a hybrid NN model is created by combining a convolutional neural network (CNN), a recurrent neural network (RNN, e.g., long short-term memory (LSTM)), and a vision transformer. The CNN can extract spatial features (within a neighborhood of an image), whereas the LSTM can summarize temporal features. In addition, a transformer with an attention mechanism can capture sparse spatial relations residing in an image or between frames in a video clip. The inputs of the model are short facial videos, while the outputs are the micro-expressions recognized from the videos. The NN models are trained and tested with publicly available facial micro-expression datasets to recognize different micro-expressions (e.g., happiness, fear, anger, surprise, disgust, sadness). Score fusion and improvement metrics are also presented in our experiments. The results of our proposed models are compared with that of literature-reported methods tested on the same datasets. The proposed hybrid model performs the best, where score fusion can dramatically increase recognition performance. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

16 pages, 834 KiB  
Article
Fusion of Multi-Modal Features to Enhance Dense Video Caption
by Xuefei Huang, Ka-Hou Chan, Weifan Wu, Hao Sheng and Wei Ke
Sensors 2023, 23(12), 5565; https://doi.org/10.3390/s23125565 - 14 Jun 2023
Cited by 3 | Viewed by 1276
Abstract
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the [...] Read more.
Dense video caption is a task that aims to help computers analyze the content of a video by generating abstract captions for a sequence of video frames. However, most of the existing methods only use visual features in the video and ignore the audio features that are also essential for understanding the video. In this paper, we propose a fusion model that combines the Transformer framework to integrate both visual and audio features in the video for captioning. We use multi-head attention to deal with the variations in sequence lengths between the models involved in our approach. We also introduce a Common Pool to store the generated features and align them with the time steps, thus filtering the information and eliminating redundancy based on the confidence scores. Moreover, we use LSTM as a decoder to generate the description sentences, which reduces the memory size of the entire network. Experiments show that our method is competitive on the ActivityNet Captions dataset. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

19 pages, 13663 KiB  
Article
Chained Deep Learning Using Generalized Cross-Entropy for Multiple Annotators Classification
by Jenniffer Carolina Triana-Martinez, Julian Gil-González, Jose A. Fernandez-Gallego, Andrés Marino Álvarez-Meza and Cesar German Castellanos-Dominguez
Sensors 2023, 23(7), 3518; https://doi.org/10.3390/s23073518 - 28 Mar 2023
Cited by 1 | Viewed by 1529
Abstract
Supervised learning requires the accurate labeling of instances, usually provided by an expert. Crowdsourcing platforms offer a practical and cost-effective alternative for large datasets when individual annotation is impractical. In addition, these platforms gather labels from multiple labelers. Still, traditional multiple-annotator methods must [...] Read more.
Supervised learning requires the accurate labeling of instances, usually provided by an expert. Crowdsourcing platforms offer a practical and cost-effective alternative for large datasets when individual annotation is impractical. In addition, these platforms gather labels from multiple labelers. Still, traditional multiple-annotator methods must account for the varying levels of expertise and the noise introduced by unreliable outputs, resulting in decreased performance. In addition, they assume a homogeneous behavior of the labelers across the input feature space, and independence constraints are imposed on outputs. We propose a Generalized Cross-Entropy-based framework using Chained Deep Learning (GCECDL) to code each annotator’s non-stationary patterns regarding the input space while preserving the inter-dependencies among experts through a chained deep learning approach. Experimental results devoted to multiple-annotator classification tasks on several well-known datasets demonstrate that our GCECDL can achieve robust predictive properties, outperforming state-of-the-art algorithms by combining the power of deep learning with a noise-robust loss function to deal with noisy labels. Moreover, network self-regularization is achieved by estimating each labeler’s reliability within the chained approach. Lastly, visual inspection and relevance analysis experiments are conducted to reveal the non-stationary coding of our method. In a nutshell, GCEDL weights reliable labelers as a function of each input sample and achieves suitable discrimination performance with preserved interpretability regarding each annotator’s trustworthiness estimation. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

14 pages, 8933 KiB  
Article
Enhancing Mask Transformer with Auxiliary Convolution Layers for Semantic Segmentation
by Zhengyu Xia and Joohee Kim
Sensors 2023, 23(2), 581; https://doi.org/10.3390/s23020581 - 04 Jan 2023
Cited by 3 | Viewed by 2184
Abstract
Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to [...] Read more.
Transformer-based semantic segmentation methods have achieved excellent performance in recent years. Mask2Former is one of the well-known transformer-based methods which unifies common image segmentation into a universal model. However, it performs relatively poorly in obtaining local features and segmenting small objects due to relying heavily on transformers. To this end, we propose a simple yet effective architecture that introduces auxiliary branches to Mask2Former during training to capture dense local features on the encoder side. The obtained features help improve the performance of learning local information and segmenting small objects. Since the proposed auxiliary convolution layers are required only for training and can be removed during inference, the performance gain can be obtained without additional computation at inference. Experimental results show that our model can achieve state-of-the-art performance (57.6% mIoU) on the ADE20K and (84.8% mIoU) on the Cityscapes datasets. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Graphical abstract

11 pages, 2439 KiB  
Article
Research on Waste Plastics Classification Method Based on Multi-Scale Feature Fusion
by Zhenxing Cai, Jianhong Yang, Huaiying Fang, Tianchen Ji, Yangyang Hu and Xin Wang
Sensors 2022, 22(20), 7974; https://doi.org/10.3390/s22207974 - 19 Oct 2022
Cited by 1 | Viewed by 1764
Abstract
Microplastic particles produced by non-degradable waste plastic bottles have a critical impact on the environment. Reasonable recycling is a premise that protects the environment and improves economic benefits. In this paper, a multi-scale feature fusion method for RGB and hyperspectral images based on [...] Read more.
Microplastic particles produced by non-degradable waste plastic bottles have a critical impact on the environment. Reasonable recycling is a premise that protects the environment and improves economic benefits. In this paper, a multi-scale feature fusion method for RGB and hyperspectral images based on Segmenting Objects by Locations (RHFF-SOLOv1) is proposed, which uses multi-sensor fusion technology to improve the accuracy of identifying transparent polyethylene terephthalate (PET) bottles, blue PET bottles, and transparent polypropylene (PP) bottles on a black conveyor belt. A line-scan camera and near-infrared (NIR) hyperspectral camera covering the spectral range from 935.9 nm to 1722.5 nm are used to obtain RGB and hyperspectral images synchronously. Moreover, we propose a hyperspectral feature band selection method that effectively reduces the dimensionality and selects the bands from 1087.6 nm to 1285.1 nm as the features of the hyperspectral image. The results show that the proposed fusion method improves the accuracy of plastic bottle classification compared with the SOLOv1 method, and the overall accuracy is 95.55%. Finally, compared with other space-spectral fusion methods, RHFF-SOLOv1 is superior to most of them and achieves the best (97.5%) accuracy in blue bottle classification. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

16 pages, 3876 KiB  
Article
Research on an Algorithm of Express Parcel Sorting Based on Deeper Learning and Multi-Information Recognition
by Xing Xu, Zhenpeng Xue and Yun Zhao
Sensors 2022, 22(17), 6705; https://doi.org/10.3390/s22176705 - 05 Sep 2022
Cited by 7 | Viewed by 2181
Abstract
With the development of smart logistics, current small distribution centers have begun to use intelligent equipment to indirectly read bar code information on courier sheets to carry out express sorting. However, limited by the cost, most of them choose relatively low-end sorting equipment [...] Read more.
With the development of smart logistics, current small distribution centers have begun to use intelligent equipment to indirectly read bar code information on courier sheets to carry out express sorting. However, limited by the cost, most of them choose relatively low-end sorting equipment in a warehouse environment that is complex. This single information identification method leads to a decline in the identification rate of sorting, affecting efficiency of the entire express sorting. Aimed at the above problems, an express recognition method based on deeper learning and multi-information fusion is proposed. The method is mainly aimed at bar code information and three segments of code information on the courier sheet, which is divided into two parts: target information detection and recognition. For the detection of target information, we used a method of deeper learning to detect the target, and to improve speed and precision we designed a target detection network based on the existing YOLOv4 network, Experiments show that the detection accuracy and speed of the redesigned target detection network were much improved. Next for recognition of two kinds of target information we first intercepted the image after positioning and used a ZBAR algorithm to decode the barcode image after interception. The we used Tesseract-OCR technology to identify the intercepted three segments code picture information, and finally output the information in the form of strings. This deeper learning-based multi-information identification method can help logistics centers to accurately obtain express sorting information from the database. The experimental results show that the time to detect a picture was 0.31 s, and the recognition accuracy was 98.5%, which has better robustness and accuracy than single barcode information positioning and recognition alone. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

15 pages, 2550 KiB  
Article
Convolutional Neural Network Approach Based on Multimodal Biometric System with Fusion of Face and Finger Vein Features
by Yang Wang, Dekai Shi and Weibin Zhou
Sensors 2022, 22(16), 6039; https://doi.org/10.3390/s22166039 - 12 Aug 2022
Cited by 21 | Viewed by 2958
Abstract
In today’s information age, how to accurately identify a person’s identity and protect information security has become a hot topic of people from all walks of life. At present, a more convenient and secure solution to identity identification is undoubtedly biometric identification, but [...] Read more.
In today’s information age, how to accurately identify a person’s identity and protect information security has become a hot topic of people from all walks of life. At present, a more convenient and secure solution to identity identification is undoubtedly biometric identification, but a single biometric identification cannot support increasingly complex and diversified authentication scenarios. Using multimodal biometric technology can improve the accuracy and safety of identification. This paper proposes a biometric method based on finger vein and face bimodal feature layer fusion, which uses a convolutional neural network (CNN), and the fusion occurs in the feature layer. The self-attention mechanism is used to obtain the weights of the two biometrics, and combined with the RESNET residual structure, the self-attention weight feature is cascaded with the bimodal fusion feature channel Concat. To prove the high efficiency of bimodal feature layer fusion, AlexNet and VGG-19 network models were selected in the experimental part for extracting finger vein and face image features as inputs to the feature fusion module. The extensive experiments show that the recognition accuracy of both models exceeds 98.4%, demonstrating the high efficiency of the bimodal feature fusion. Full article
(This article belongs to the Special Issue Deep Learning for Information Fusion and Pattern Recognition)
Show Figures

Figure 1

Back to TopTop