Machine Learning and Deep Learning Based Pattern Recognition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 30 September 2024 | Viewed by 11677

Special Issue Editors


E-Mail Website
Guest Editor
Pattern Processing Lab, School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
Interests: pattern recognition; character recognition; image processing; computer vision; human–computer interaction; neurological disease analysis; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computer Science and Engineering, Rajshahi University of Engineering and Technology(RUET), Rajshahi 6204, Bangladesh
Interests: bioinformatics; artificial intelligence; pattern recognition; medical image and signal processing; machine learning; computer vision

E-Mail Website
Guest Editor
Computer Communications Laboratory, School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima 965-8580, Japan
Interests: applications of artificial intelligence/machine learning for wireless networks; wireless communication networks; network security

Special Issue Information

Dear Colleagues,

In the modern digital world, patterns can be found in many facets of daily life. They can be physically observed or computationally detected using algorithms. In the digital environment, a pattern is represented by a vector or matrix feature value. Recently, numerous machine learning (ML)- and deep learning (DL)-based techniques have been widely used in order to handle or analyze these feature values in the artificial intelligence (AI) domain. ML is a branch of AI and its goal is to let the computer make its own decisions with minimal human involvement using pattern data. On the other hand, DL is a branch of ML and a popular topic in the field of AI.  Using DL and ML models to extract meaningful features from the given text, image, video, or sensor data and analyze those features is known as pattern recognition (PR). PR has been used in various applications in the fields of engineering such as computer vision, sensor data analysis, natural language processing, speech recognition, robotics, bioinformatics, and so on. The goal of this Special Issue is to publish innovative and technically sound research papers that exhibit theoretical and practical contributions to PR utilizing ML and DL methodologies.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Image processing/segmentation/recognition;
  • Computer vision;
  • Speech recognition;
  • Automated target recognition;
  • Character recognition;
  • Gesture and human activity recognition;
  • Industrial inspection;
  • Medical diagnosis;
  • Health informatics;
  • Biosignal processing;
  • Bioinformatics;
  • Remote sensing ;
  • Healthcare application;
  • ML and DL and the Internet of Things (IoT);
  • Large dataset analysis;
  • Current state-of-the-art and future trends of ML and DL.

Prof. Dr. Jungpil Shin
Prof. Dr. Md. Al Mehedi Hasan 
Dr. Hoang D. Le
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • pattern recognition

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 2103 KiB  
Article
GDCP-YOLO: Enhancing Steel Surface Defect Detection Using Lightweight Machine Learning Approach
by Zhaohui Yuan, Hao Ning, Xiangyang Tang and Zhengzhe Yang
Electronics 2024, 13(7), 1388; https://doi.org/10.3390/electronics13071388 - 06 Apr 2024
Viewed by 473
Abstract
Surface imperfections in steel materials potentially degrade quality and performance, thereby escalating the risk of accidents in engineering applications. Manual inspection, while traditional, is laborious and lacks consistency. However, recent advancements in machine learning and computer vision have paved the way for automated [...] Read more.
Surface imperfections in steel materials potentially degrade quality and performance, thereby escalating the risk of accidents in engineering applications. Manual inspection, while traditional, is laborious and lacks consistency. However, recent advancements in machine learning and computer vision have paved the way for automated steel defect detection, yielding superior accuracy and efficiency. This paper introduces an innovative deep learning model, GDCP-YOLO, devised for multi-category steel defect detection. We enhance the reference YOLOv8n architecture by incorporating adaptive receptive fields via the DCNV2 module and channel attention in C2f. These integrations aim to concentrate on valuable features and minimize parameters. We incorporate the efficient Faster Block and employ Ghost convolutions to generate more feature maps with reduced computation. These modifications streamline feature extraction, curtail redundant information processing, and boost detection accuracy and speed. Comparative trials on the NEU-DET dataset underscore the state-of-the-art performance of GDCP-YOLO. Ablation studies and generalization experiments reveal consistent performance across a variety of defect types. The optimized lightweight architecture facilitates real-time automated inspection without sacrificing accuracy, offering invaluable insights to further deep learning techniques for surface defect identification across manufacturing sectors. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

15 pages, 7888 KiB  
Article
Sign Language Recognition with Multimodal Sensors and Deep Learning Methods
by Chenghong Lu, Misaki Kozakai and Lei Jing
Electronics 2023, 12(23), 4827; https://doi.org/10.3390/electronics12234827 - 29 Nov 2023
Cited by 1 | Viewed by 987
Abstract
Sign language recognition is essential in hearing-impaired people’s communication. Wearable data gloves and computer vision are partially complementary solutions. However, sign language recognition using a general monocular camera suffers from occlusion and recognition accuracy issues. In this research, we aim to improve accuracy [...] Read more.
Sign language recognition is essential in hearing-impaired people’s communication. Wearable data gloves and computer vision are partially complementary solutions. However, sign language recognition using a general monocular camera suffers from occlusion and recognition accuracy issues. In this research, we aim to improve accuracy through data fusion of 2-axis bending sensors and computer vision. We obtain the hand key point information of sign language movements captured by a monocular RGB camera and use key points to calculate hand joint angles. The system achieves higher recognition accuracy by fusing multimodal data of the skeleton, joint angles, and finger curvature. In order to effectively fuse data, we spliced multimodal data and used CNN-BiLSTM to extract effective features for sign language recognition. CNN is a method that can learn spatial information, and BiLSTM can learn time series data. We built a data collection system with bending sensor data gloves and cameras. A dataset was collected that contains 32 Japanese sign language movements of seven people, including 27 static movements and 5 dynamic movements. Each movement is repeated 10 times, totaling about 112 min. In particular, we obtained data containing occlusions. Experimental results show that our system can fuse multimodal information and perform better than using only skeletal information, with the accuracy increasing from 68.34% to 84.13%. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

18 pages, 1035 KiB  
Article
Comparison of Different Methods for Building Ensembles of Convolutional Neural Networks
by Loris Nanni, Andrea Loreggia and Sheryl Brahnam
Electronics 2023, 12(21), 4428; https://doi.org/10.3390/electronics12214428 - 27 Oct 2023
Cited by 1 | Viewed by 770
Abstract
In computer vision and image analysis, Convolutional Neural Networks (CNNs) and other deep-learning models are at the forefront of research and development. These advanced models have proven to be highly effective in tasks related to computer vision. One technique that has gained prominence [...] Read more.
In computer vision and image analysis, Convolutional Neural Networks (CNNs) and other deep-learning models are at the forefront of research and development. These advanced models have proven to be highly effective in tasks related to computer vision. One technique that has gained prominence in recent years is the construction of ensembles using deep CNNs. These ensembles typically involve combining multiple pretrained CNNs to create a more powerful and robust network. The purpose of this study is to evaluate the effectiveness of building CNN ensembles by combining several advanced techniques. Tested here are CNN ensembles constructed by replacing ReLU layers with different activation functions, employing various data-augmentation techniques, and utilizing several algorithms, including some novel ones, that perturb network weights. Experimental results performed across many datasets representing different tasks demonstrate that our proposed methods for building deep ensembles produces superior results. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

12 pages, 16406 KiB  
Article
A Study on Webtoon Generation Using CLIP and Diffusion Models
by Kyungho Yu, Hyoungju Kim, Jeongin Kim, Chanjun Chun and Pankoo Kim
Electronics 2023, 12(18), 3983; https://doi.org/10.3390/electronics12183983 - 21 Sep 2023
Viewed by 1204
Abstract
This study focuses on harnessing deep-learning-based text-to-image transformation techniques to help webtoon creators’ creative outputs. We converted publicly available datasets (e.g., MSCOCO) into a multimodal webtoon dataset using CartoonGAN. First, the dataset was leveraged for training contrastive language image pre-training (CLIP), a model [...] Read more.
This study focuses on harnessing deep-learning-based text-to-image transformation techniques to help webtoon creators’ creative outputs. We converted publicly available datasets (e.g., MSCOCO) into a multimodal webtoon dataset using CartoonGAN. First, the dataset was leveraged for training contrastive language image pre-training (CLIP), a model composed of multi-lingual BERT and a Vision Transformer that learnt to associate text with images. Second, a pre-trained diffusion model was employed to generate webtoons through text and text-similar image input. The webtoon dataset comprised treatments (i.e., textual descriptions) paired with their corresponding webtoon illustrations. CLIP (operating through contrastive learning) extracted features from different data modalities and aligned similar data more closely within the same feature space while pushing dissimilar data apart. This model learnt the relationships between various modalities in multimodal data. To generate webtoons using the diffusion model, the process involved providing the CLIP features of the desired webtoon’s text with those of the most text-similar image to a pre-trained diffusion model. Experiments were conducted using both single- and continuous-text inputs to generate webtoons. In the experiments, both single-text and continuous-text inputs were used to generate webtoons, and the results showed an inception score of 7.14 when using continuous-text inputs. The text-to-image technology developed here could streamline the webtoon creation process for artists by enabling the efficient generation of webtoons based on the provided text. However, the current inability to generate webtoons from multiple sentences or images while maintaining a consistent artistic style was noted. Therefore, further research is imperative to develop a text-to-image model capable of handling multi-sentence and -lingual input while ensuring coherence in the artistic style across the generated webtoon images. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

22 pages, 7626 KiB  
Article
Principal Component Analysis-Based Logistic Regression for Rotated Handwritten Digit Recognition in Consumer Devices
by Chao-Chung Peng, Chao-Yang Huang and Yi-Ho Chen
Electronics 2023, 12(18), 3809; https://doi.org/10.3390/electronics12183809 - 08 Sep 2023
Viewed by 787
Abstract
Handwritten digit recognition has been used in many consumer electronic devices for a long time. However, we found that the recognition system used in current consumer electronics is sensitive to image or character rotations. To address this problem, this study builds a low-cost [...] Read more.
Handwritten digit recognition has been used in many consumer electronic devices for a long time. However, we found that the recognition system used in current consumer electronics is sensitive to image or character rotations. To address this problem, this study builds a low-cost and light computation consumption handwritten digit recognition system. A Principal Component Analysis (PCA)-based logistic regression classifier is presented, which is able to provide a certain degree of robustness in the digit subject to rotations. To validate the effectiveness of the developed image recognition algorithm, the popular MNIST dataset is used to conduct performance evaluations. Compared to other popular classifiers installed in MATLAB, the proposed method is able to achieve better prediction results with a smaller model size, which is 18.5% better than the traditional logistic regression. Finally, real-time experiments are conducted to verify the efficiency of the presented method, showing that the proposed system is successfully able to classify the rotated handwritten digit. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

12 pages, 306 KiB  
Article
Latent Regression Bayesian Network for Speech Representation
by Liang Xu, Yue Zhao, Xiaona Xu, Yigang Liu and Qiang Ji
Electronics 2023, 12(15), 3342; https://doi.org/10.3390/electronics12153342 - 04 Aug 2023
Viewed by 616
Abstract
In this paper, we present a novel approach for speech representation using latent regression Bayesian networks (LRBN) to address the issue of poor performance in low-resource language speech systems. LRBN, a lightweight unsupervised learning model, learns data distribution and high-level features, unlike computationally [...] Read more.
In this paper, we present a novel approach for speech representation using latent regression Bayesian networks (LRBN) to address the issue of poor performance in low-resource language speech systems. LRBN, a lightweight unsupervised learning model, learns data distribution and high-level features, unlike computationally expensive large models, such as Wav2vec 2.0. To evaluate the effectiveness of LRBN in learning speech representations, we conducted experiments on five different low-resource languages and applied them to two downstream tasks: phoneme classification and speech recognition. Our experimental results demonstrate that LRBN outperforms prevailing speech representation methods in both tasks, highlighting its potential in the realm of speech representation learning for low-resource languages. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

20 pages, 4433 KiB  
Article
Maintain a Better Balance between Performance and Cost for Image Captioning by a Size-Adjustable Convolutional Module
by Yan Lyu, Yong Liu and Qiangfu Zhao
Electronics 2023, 12(14), 3187; https://doi.org/10.3390/electronics12143187 - 22 Jul 2023
Viewed by 1080
Abstract
Image captioning is a challenging AI problem that connects computer vision and natural language processing. Many deep learning (DL) models have been proposed in the literature for solving this problem. So far, the primary concern of image captioning has been focused on increasing [...] Read more.
Image captioning is a challenging AI problem that connects computer vision and natural language processing. Many deep learning (DL) models have been proposed in the literature for solving this problem. So far, the primary concern of image captioning has been focused on increasing the accuracy of generating human-style sentences for describing given images. As a result, state-of-the-art (SOTA) models are often too expensive to be implemented in computationally weak devices. In contrast, the primary concern of this paper is to maintain a balance between performance and cost. For this purpose, we propose using a DL model pre-trained for object detection to encode the given image so that features of various objects can be extracted simultaneously. We also propose adding a size-adjustable convolutional module (SACM) before decoding the features into sentences. The experimental results show that the model with the properly adjusted SACM could reach a BLEU-1 score of 82.3 and a BLEU-4 score of 43.9 on the Flickr 8K dataset, and a BLEU-1 score of 83.1 and a BLEU-4 score of 44.3 on the MS COCO dataset. With the SACM, the number of parameters is decreased to 108M, which is about 1/4 of the original YOLOv3-LSTM model with 430M parameters. Specifically, compared with mPLUG with 510M parameters, which is one of the SOTA methods, the proposed method can achieve almost the same BLEU-4 scores, but the number of parameters is 78% less than the mPLUG. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

15 pages, 1906 KiB  
Article
Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition
by Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Si-Woong Jang, Hyoun-Sup Lee and Jungpil Shin
Electronics 2023, 12(13), 2841; https://doi.org/10.3390/electronics12132841 - 27 Jun 2023
Cited by 6 | Viewed by 1158
Abstract
Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently [...] Read more.
Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently sought to develop skeleton-based sign language recognition systems to overcome the subject and background variation in hand gesture sign videos. However, skeleton-based SLR is still under exploration, mainly due to a lack of information and hand key point annotations. More recently, researchers have included body and face information along with hand gesture information for SLR; however, the obtained performance accuracy and generalizability properties remain unsatisfactory. In this paper, we propose a multi-stream graph-based deep neural network (SL-GDN) for a skeleton-based SLR system in order to overcome the above-mentioned problems. The main purpose of the proposed SL-GDN approach is to improve the generalizability and performance accuracy of the SLR system while maintaining a low computational cost based on the human body pose in the form of 2D landmark locations. We first construct a skeleton graph based on 27 whole-body key points selected among 67 key points to address the high computational cost problem. Then, we utilize the multi-stream SL-GDN to extract features from the whole-body skeleton graph considering four streams. Finally, we concatenate the four different features and apply a classification module to refine the features and recognize corresponding sign classes. Our data-driven graph construction method increases the system’s flexibility and brings high generalizability, allowing it to adapt to varied data. We use two large-scale benchmark SLR data sets to evaluate the proposed model: The Turkish Sign Language data set (AUTSL) and Chinese Sign Language (CSL). The reported performance accuracy results demonstrate the outstanding ability of the proposed model, and we believe that it will be considered a great innovation in the SLR domain. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

19 pages, 17533 KiB  
Article
Stochastic Neighbor Embedding Feature-Based Hyperspectral Image Classification Using 3D Convolutional Neural Network
by Md. Moazzem Hossain, Md. Ali Hossain, Abu Saleh Musa Miah, Yuichi Okuyama, Yoichi Tomioka and Jungpil Shin
Electronics 2023, 12(9), 2082; https://doi.org/10.3390/electronics12092082 - 02 May 2023
Cited by 6 | Viewed by 1411
Abstract
The ample amount of information from hyperspectral image (HSI) bands allows the non-destructive detection and recognition of earth objects. However, dimensionality reduction (DR) of hyperspectral images (HSI) is required before classification as the classifier may suffer from the curse of dimensionality. Therefore, dimensionality [...] Read more.
The ample amount of information from hyperspectral image (HSI) bands allows the non-destructive detection and recognition of earth objects. However, dimensionality reduction (DR) of hyperspectral images (HSI) is required before classification as the classifier may suffer from the curse of dimensionality. Therefore, dimensionality reduction plays a significant role in HSI data analysis (e.g., effective processing and seamless interpretation). In this article, a sophisticated technique established as t-Distributed Stochastic Neighbor Embedding (tSNE) following the dimension reduction along with a blended CNN was implemented to improve the visualization and characterization of HSI. In the procedure, first, we employed principal component analysis (PCA) to reduce the HSI dimensions and remove non-linear consistency features between the wavelengths to project them to a smaller scale. Then we proposed tSNE to preserve the local and global pixel relationships and check the HSI information visually and experimentally. Lastly, it yielded two-dimensional data, improving the visualization and classification accuracy compared to other standard dimensionality-reduction algorithms. Finally, we employed deep-learning-based CNN to classify the reduced and improved HSI intra- and inter-band relationship-feature vector. The evaluation performance of 95.21% accuracy and 6.2% test loss proved the superiority of the proposed model compared to other state-of-the-art DR reduction algorithms. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

11 pages, 431 KiB  
Communication
Supervised Learning Spectrum Sensing Method via Geometric Power Feature
by Qian Hu, Zhongqiang Luo and Wenshi Xiao
Electronics 2023, 12(7), 1616; https://doi.org/10.3390/electronics12071616 - 29 Mar 2023
Cited by 1 | Viewed by 941
Abstract
In order to improve the spectrum sensing (SS) performance under a low Signal Noise Ratio (SNR), this paper proposes a supervised learning spectrum sensing method based on Geometric Power (GP) feature. The GP is used as the feature vector in the supervised learning [...] Read more.
In order to improve the spectrum sensing (SS) performance under a low Signal Noise Ratio (SNR), this paper proposes a supervised learning spectrum sensing method based on Geometric Power (GP) feature. The GP is used as the feature vector in the supervised learning spectrum sensing method for training and testing based on the actual captured data set. Experimental results show that the detection performance of the GP-based supervised learning spectrum sensing method is better than that of the Energy Statistics (ES) and Differential Entropy (DE)-based supervised learning spectrum sensing methods. Full article
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)
Show Figures

Figure 1

Back to TopTop