Signal and Image Processing Applications in Artificial Intelligence

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 17 December 2024 | Viewed by 18258

Special Issue Editors

ITI/Larsys/Higher School of Technology and Management, University of Madeira, 9020-105 Funchal, Portugal
Interests: signal processing; sleep analysis; machine learning; biomedical analysis
Special Issues, Collections and Topics in MDPI journals
Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal
Interests: CNN; deep learning; sleep apnea; sensors for sleep apnea; RNN; deep neural network
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue focuses on general applications of artificial intelligence using signal and image processing, covering a broad range of research topics where data-driven decision-making is employed.

This Special Issue aims to evaluate innovative research cases where artificial intelligence has brought novel views.

The relevance of these topics is continuously increasing, as machine-learning-based components are becoming the backbone of modern systems using the Internet of Things (IoT) to bring intelligence to our daily activities. These intelligent devices provide outstanding data collection capabilities, but there is a need to develop sophisticated data-driven algorithms to process the generated information. Likewise, this Special Issue would like to address the impact of machine learning and the novel methods based on deep learning with big data on conventional and new research issues, studying how these new techniques can help improve the state-of-the-art.

Topics of interest include, but are not limited to:

  • Data-driven algorithms based on IoT systems generated data;
  • Applications of transfer learning for image processing;
  • Object detection using machine learning;
  • Feature selection techniques;
  • IoT-based applications with data analysis;
  • Real-world applications of machine learning.

Dr. Fabio Mendonca
Dr. Morgado Dias
Dr. Sheikh Shanawaz Mostafa
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • image classification
  • big data
  • signal processing
  • explainable machine learning
  • Internet of Things
  • model-agnostic techniques
  • imaging analytics
  • machine learning and deep learning
  • feature selection

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

14 pages, 450 KiB  
Article
Online Mongolian Handwriting Recognition Based on Encoder–Decoder Structure with Language Model
by Daoerji Fan, Yuxin Sun, Zhixin Wang and Yanjun Peng
Electronics 2023, 12(20), 4194; https://doi.org/10.3390/electronics12204194 - 10 Oct 2023
Cited by 1 | Viewed by 905
Abstract
Mongolian online handwriting recognition is a complex task due to the script’s intricate characters and extensive vocabulary. This study proposes a novel approach by integrating a pre-trained language model into the sequence-to-sequence(Seq2Seq) + attention mechanisms(AM) model to enhance recognition accuracy. Three fusion models, [...] Read more.
Mongolian online handwriting recognition is a complex task due to the script’s intricate characters and extensive vocabulary. This study proposes a novel approach by integrating a pre-trained language model into the sequence-to-sequence(Seq2Seq) + attention mechanisms(AM) model to enhance recognition accuracy. Three fusion models, including former, latter, and complete fusion, are introduced, showing substantial improvements over the baseline model. The complete fusion model, combined with synchronized language model parameters, achieved the best results, significantly reducing character and word error rates. This research presents a promising solution for accurate Mongolian online handwriting recognition, offering practical applications in preserving and utilizing the Mongolian script. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

16 pages, 5101 KiB  
Article
X-ray Detection of Prohibited Item Method Based on Dual Attention Mechanism
by Ying Li, Changshe Zhang, Shiyu Sun and Guangsong Yang
Electronics 2023, 12(18), 3934; https://doi.org/10.3390/electronics12183934 - 18 Sep 2023
Cited by 1 | Viewed by 923
Abstract
Prohibited item detection plays a significant role in ensuring public safety, as the timely and accurate identification of prohibited items ensures the safety of lives and property. X-ray transmission imaging technology is commonly employed for prohibited item detection in public spaces, producing X-ray [...] Read more.
Prohibited item detection plays a significant role in ensuring public safety, as the timely and accurate identification of prohibited items ensures the safety of lives and property. X-ray transmission imaging technology is commonly employed for prohibited item detection in public spaces, producing X-ray images of luggage to visualize their internal contents. However, challenges such as multiple object overlapping, varying angles, loss of details, and small targets in X-ray transmission imaging pose significant obstacles to prohibited item detection. Therefore, a dual attention mechanism network (DAMN) for X-ray prohibited item detection is proposed. The DAMN consists of three modules, i.e., spatial attention, channel attention, and dependency relationship optimization. A long-range dependency model is achieved by employing a dual attention mechanism with spatial and channel attention, effectively extracting feature information. Meanwhile, the dependency relationship module is integrated to address the shortcomings of traditional convolutional networks in terms of short-range correlations. We conducted experiments comparing the DAMN with several existing algorithms on datasets containing 12 categories of prohibited items, including firearms and knives. The results show that the DAMN has a good performance, particularly in scenarios involving small object detection, detail loss, and target overlap under complex conditions. Specifically, the detection average precision of the DAMN reaches 63.8%, with a segmentation average precision of 54.7%. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

19 pages, 1250 KiB  
Article
A Multi-Stage Acoustic Echo Cancellation Model Based on Adaptive Filters and Deep Neural Networks
by Shiyun Xu, Changjun He, Bosong Yan and Mingjiang Wang
Electronics 2023, 12(15), 3258; https://doi.org/10.3390/electronics12153258 - 28 Jul 2023
Viewed by 1072
Abstract
The presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model [...] Read more.
The presence of a large amount of echoes significantly impairs the quality and intelligibility of speech during communication. To address this issue, numerous studies and models have been conducted to cancel echo. In this study, we propose a multi-stage acoustic echo cancellation model that utilizes an adaptive filter and a deep neural network. Our model consists of two parts: the Speex algorithm for canceling linear echo, and the multi-scale time-frequency UNet (MSTFUNet) for further echo cancellation. The Speex algorithm takes the far-end reference speech and the near-end microphone signal as inputs, and outputs the signal after linear echo cancellation. MSTFUNet takes the spectra of the far-end reference speech, the near-end microphone signal, and the output of Speex as inputs, and generates the estimated near-end speech spectrum as output. To enhance the performance of the Speex algorithm, we conduct delay estimation and compensation to the far-end reference speech. For MSTFUNet, we employ multi-scale time-frequency processing to extract information from the input spectrum. Additionally, we incorporate an improved time-frequency self-attention to capture time-frequency information. Furthermore, we introduce channel time-frequency attention to alleviate information loss during downsampling and upsampling. In our experiments, we evaluate the performance of our proposed model on both our test set and the blind test set of the Acoustic Echo Cancellation challenge. Our proposed model exhibits superior performance in terms of acoustic echo cancellation and noise reverberation suppression compared to other models. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

16 pages, 5822 KiB  
Article
Visual Explanations of Deep Learning Architectures in Predicting Cyclic Alternating Patterns Using Wavelet Transforms
by Ankit Gupta, Fábio Mendonça, Sheikh Shanawaz Mostafa, Antonio G. Ravelo-García and Fernando Morgado-Dias
Electronics 2023, 12(13), 2954; https://doi.org/10.3390/electronics12132954 - 05 Jul 2023
Viewed by 843
Abstract
Cyclic Alternating Pattern (CAP) is a sleep instability marker defined based on the amplitude and frequency of the electroencephalogram signal. Because of the time and intensive process of labeling the data, different machine learning and automatic approaches are proposed. However, due to the [...] Read more.
Cyclic Alternating Pattern (CAP) is a sleep instability marker defined based on the amplitude and frequency of the electroencephalogram signal. Because of the time and intensive process of labeling the data, different machine learning and automatic approaches are proposed. However, due to the low accuracy of the traditional approach and the black box approach of the machine learning approach, the proposed systems remain untrusted by the physician. This study contributes to accurately estimating CAP in a Frequency-Time domain by A-phase and its subtypes prediction by transforming the monopolar deviated electroencephalogram signals into corresponding scalograms. Subsequently, various computer vision classifiers were tested for the A-phase using scalogram images. It was found that MobileNetV2 outperformed all other tested classifiers by achieving the average accuracy, sensitivity, and specificity values of 0.80, 0.75, and 0.81, respectively. The MobileNetV2 trained model was further fine-tuned for A-phase subtypes prediction. To further verify the visual ability of the trained models, Gradcam++ was employed to identify the targeted regions by the trained network. It was verified that the areas identified by the model match the regions focused on by the sleep experts for A-phase predictions, thereby proving its clinical viability and robustness. This motivates the development of novel deep learning based methods for CAP patterns predictions. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

13 pages, 2656 KiB  
Article
Fast Mode Decision Method of Multiple Weighted Bi-Predictions Using Lightweight Multilayer Perceptron in Versatile Video Coding
by Taesik Lee and Dongsan Jun
Electronics 2023, 12(12), 2685; https://doi.org/10.3390/electronics12122685 - 15 Jun 2023
Cited by 1 | Viewed by 862
Abstract
Versatile Video Coding (VVC), the state-of-the-art video coding standard, was developed by the Joint Video Experts Team (JVET) of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) in 2020. Although VVC can provide powerful coding performance, it requires [...] Read more.
Versatile Video Coding (VVC), the state-of-the-art video coding standard, was developed by the Joint Video Experts Team (JVET) of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) in 2020. Although VVC can provide powerful coding performance, it requires tremendous computational complexity to determine the optimal mode decision during the encoding process. In particular, VVC adopted the bi-prediction with CU-level weight (BCW) as one of the new tools, which enhanced the coding efficiency of conventional bi-prediction by assigning different weights to the two prediction blocks in the process of inter prediction. In this study, we investigate the statistical characteristics of input features that exhibit a correlation with the BCW and define four useful types of categories to facilitate the inter prediction of VVC. With the investigated input features, a lightweight neural network with multilayer perceptron (MLP) architecture is designed to provide high accuracy and low complexity. We propose a fast BCW mode decision method with a lightweight MLP to reduce the computational complexity of the weighted multiple bi-prediction in the VVC encoder. The experimental results show that the proposed method significantly reduced the BCW encoding complexity by up to 33% with unnoticeable coding loss, compared to the VVC test model (VTM) under the random-access (RA) configuration. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

19 pages, 25231 KiB  
Article
Federated Learning Approach for Early Detection of Chest Lesion Caused by COVID-19 Infection Using Particle Swarm Optimization
by Dasaradharami Reddy Kandati and Thippa Reddy Gadekallu
Electronics 2023, 12(3), 710; https://doi.org/10.3390/electronics12030710 - 31 Jan 2023
Cited by 14 | Viewed by 1809
Abstract
The chest lesion caused by COVID-19 infection pandemic is threatening the lives and well-being of people all over the world. Artificial intelligence (AI)-based strategies are efficient methods for helping radiologists by assessing the vast number of chest X-ray images, which may play a [...] Read more.
The chest lesion caused by COVID-19 infection pandemic is threatening the lives and well-being of people all over the world. Artificial intelligence (AI)-based strategies are efficient methods for helping radiologists by assessing the vast number of chest X-ray images, which may play a significant role in simplifying and improving the diagnosis of chest lesion caused by COVID-19 infection. Machine learning (ML) and deep learning (DL) are such AI strategies that have helped researchers predict chest lesion caused by COVID-19 infection cases. But ML and DL strategies face challenges like transmission delays, a lack of computing power, communication delays, and privacy concerns. Federated Learning (FL) is a new development in ML that makes it easier to collect, process, and analyze large amounts of multidimensional data. This could help solve the challenges that have been identified in ML and DL. However, FL algorithms send and receive large amounts of weights from client-side trained models, resulting in significant communication overhead. To address this problem, we offer a unified framework combining FL and a particle swarm optimization algorithm (PSO) to speed up the government’s response time to chest lesion caused by COVID-19 infection outbreaks. The Federated Particle Swarm Optimization approach is tested on a multidimensional chest lesion caused by the COVID-19 infection image dataset and the chest X-ray (pneumonia) dataset from Kaggle’s repository. Our research shows that the proposed model works better when there is an uneven amount of data, has lower communication costs, and is therefore more efficient from a network’s point of view. The results of the proposed approach were validated; 96.15% prediction accuracy was achieved for chest lesions caused by the COVID-19 infection dataset, and 96.55% prediction accuracy was achieved for the chest X-ray (pneumonia) dataset. These results can be used to develop a progressive approach for the early detection of chest lesion caused by COVID-19 infection. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

15 pages, 3771 KiB  
Article
Facial Emotion Recognition with Inter-Modality-Attention-Transformer-Based Self-Supervised Learning
by Aayushi Chaudhari, Chintan Bhatt, Achyut Krishna and Carlos M. Travieso-González
Electronics 2023, 12(2), 288; https://doi.org/10.3390/electronics12020288 - 05 Jan 2023
Cited by 8 | Viewed by 2819
Abstract
Emotion recognition is a very challenging research field due to its complexity, as individual differences in cognitive–emotional cues involve a wide variety of ways, including language, expressions, and speech. If we use video as the input, we can acquire a plethora of data [...] Read more.
Emotion recognition is a very challenging research field due to its complexity, as individual differences in cognitive–emotional cues involve a wide variety of ways, including language, expressions, and speech. If we use video as the input, we can acquire a plethora of data for analyzing human emotions. In this research, we use features derived from separately pretrained self-supervised learning models to combine text, audio (speech), and visual data modalities. The fusion of features and representation is the biggest challenge in multimodal emotion classification research. Because of the large dimensionality of self-supervised learning characteristics, we present a unique transformer and attention-based fusion method for incorporating multimodal self-supervised learning features that achieved an accuracy of 86.40% for multimodal emotion classification. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

24 pages, 5307 KiB  
Article
Rule-Based Embedded HMMs Phoneme Classification to Improve Qur’anic Recitation Recognition
by Ammar Mohammed Ali Alqadasi, Mohd Shahrizal Sunar, Sherzod Turaev, Rawad Abdulghafor, Md Sah Hj Salam, Abdulaziz Ali Saleh Alashbi, Ali Ahmed Salem and Mohammed A. H. Ali
Electronics 2023, 12(1), 176; https://doi.org/10.3390/electronics12010176 - 30 Dec 2022
Cited by 4 | Viewed by 2895
Abstract
Phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. A mispronunciation of Arabic short vowels or long vowels can change the meaning of a complete sentence. However, correctly distinguishing phonemes with vowels in Quranic recitation (the [...] Read more.
Phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. A mispronunciation of Arabic short vowels or long vowels can change the meaning of a complete sentence. However, correctly distinguishing phonemes with vowels in Quranic recitation (the Holy book of Muslims) is still a challenging problem even for state-of-the-art classification methods, where the duration of the phonemes is considered one of the important features in Quranic recitation, which is called Medd, which means that the phoneme lengthening is governed by strict rules. These features of recitation call for an additional classification of phonemes in Qur’anic recitation due to that the phonemes classification based on Arabic language characteristics is insufficient to recognize Tajweed rules, including the rules of Medd. This paper introduces a Rule-Based Phoneme Duration Algorithm to improve phoneme classification in Qur’anic recitation. The phonemes of the Qur’anic dataset contain 21 Ayats collected from 30 reciters and are carefully analyzed from a baseline HMM-based speech recognition model. Using the Hidden Markov Model with tied-state triphones, a set of phoneme classification models optimized based on duration is constructed and integrated into a Quranic phoneme classification method. The proposed algorithm achieved outstanding accuracy, ranging from 99.87% to 100% according to the Medd type. The obtained results of the proposed algorithm will contribute significantly to Qur’anic recitation recognition models. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

23 pages, 1490 KiB  
Article
Detection of Diseases in Pandemic: A Predictive Approach Using Stack Ensembling on Multi-Modal Imaging Data
by Rabeea Mansoor, Munam Ali Shah, Hasan Ali Khattak, Shafaq Mussadiq, Hafiz Tayyab Rauf and Zoobia Ameer
Electronics 2022, 11(23), 3974; https://doi.org/10.3390/electronics11233974 - 30 Nov 2022
Cited by 2 | Viewed by 1778
Abstract
Deep Learning (DL) in Medical Imaging is an emerging technology for diagnosing various diseases, i.e., pneumonia, lung cancer, brain stroke, breast cancer, etc. In Machine Learning (ML) and traditional data mining approaches, feature extraction is performed before building a predictive model, which is [...] Read more.
Deep Learning (DL) in Medical Imaging is an emerging technology for diagnosing various diseases, i.e., pneumonia, lung cancer, brain stroke, breast cancer, etc. In Machine Learning (ML) and traditional data mining approaches, feature extraction is performed before building a predictive model, which is a cumbersome task. In the case of complex data, there are a lot of challenges, such as insufficient domain knowledge while performing feature engineering. With the advancement in the application of Artificial Neural Networks (ANNs) and DL, ensemble learning is an essential foundation for developing an automated diagnostic system. Medical Imaging with different modalities is effective for the detailed analysis of various chronic diseases, in which the healthy and infected scans of multiple organs are compared and analyzed. In this study, the transfer learning approach is applied to train 15 state-of-the-art DL models on three datasets (X-ray, CT-scan and Ultrasound) for predicting diseases. The performance of these models is evaluated and compared. Furthermore, a two-level stack ensembling of fine-tuned DL models is proposed. The DL models having the best performances among the 15 will be used for stacking in the first layer. Support Vector Machine (SVM) is used in Level 2 as a meta-classifier to predict the result as one of the following: pandemic positive (1) or negative (0). The proposed architecture has achieved 98.3%, 98.2% and 99% accuracy for D1, D2 and D3, respectively, which outperforms the performance of existing research. These experimental results and findings can be considered helpful tools for pandemic screening on chest X-rays, CT scan images and ultrasound images of infected patients. This architecture aims to provide clinicians with more accurate results. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

Review

Jump to: Research

42 pages, 9266 KiB  
Review
Evolution of Crack Analysis in Structures Using Image Processing Technique: A Review
by Zakrya Azouz, Barmak Honarvar Shakibaei Asli and Muhammad Khan
Electronics 2023, 12(18), 3862; https://doi.org/10.3390/electronics12183862 - 12 Sep 2023
Cited by 1 | Viewed by 2531
Abstract
Structural health monitoring (SHM) involves the control and analysis of mechanical systems to monitor the variation of geometric features of engineering structures. Damage processing is one of the issues that can be addressed by using several techniques derived from image processing. There are [...] Read more.
Structural health monitoring (SHM) involves the control and analysis of mechanical systems to monitor the variation of geometric features of engineering structures. Damage processing is one of the issues that can be addressed by using several techniques derived from image processing. There are two types of SHM: contact-based and non-contact methods. Sensors, cameras, and accelerometers are examples of contact-based SHM, whereas photogrammetry, infrared thermography, and laser imaging are non-contact SHM techniques. In this research, our focus centres on image processing algorithms to identify the crack and analyze its properties to detect occurred damages. Based on the literature review, several preprocessing approaches were employed including image enhancement, image filtering to remove the noise and blur, and dynamic response measurement to predict the crack propagation. Full article
(This article belongs to the Special Issue Signal and Image Processing Applications in Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop