sensors-logo

Journal Browser

Journal Browser

Biomedical Signal and Image Processing in Speech Analysis

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Biomedical Sensors".

Deadline for manuscript submissions: closed (30 June 2022) | Viewed by 9362

Special Issue Editors


E-Mail Website
Guest Editor
Assistant Professor of Biomedical Engineering, Department of Electrical, Electronic, and Information Engineering — Guglielmo Marconi (DEI), Alma Mater Studiorum - Università di Bologna, Viale Risorgimento, 2 - 40136 Bologna, Italy
Interests: Biomedical Engineering; Biomedical Signal and Image Processing; Rehabilitation Engineering; Speech Science, Computer Vision; Cerebral Palsy; Neurodevelopmental Disorders

E-Mail Website
Guest Editor
KITE – Toronto Rehabilitation Institute – University Health Network, Canada.
The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
Interests: Biomedical Engineering; Computer Vision; Biomedical Signal Processing; Speech Analysis; Motor Speech Disorders; Neurodegenerative Diseases; Rehabilitation Engineering

Special Issue Information

Dear Colleagues,

Speech is a predominant means of communication for human beings, and its understanding via multidisciplinary approaches can shed the light on the underlying physiological mechanisms of its production and perception. In the past decade, signal and image processing techniques, along with the advent of deep learning, have revolutionized the way speech is analysed, with numerous applications in healthcare, workplaces, marketing, language learning, and education. With the rise of digital voice assistance, artificial intelligence for speech analysis is part of our daily lives, but the available technology does not always fit the needs of all users, especially those with speech impairments or neurological disabilities. Automatic techniques for the analysis of speech, voice, and language have been used in non-invasive assessment, diagnosis and treatment of speech and neurological disorders, new-born cry analysis, human–machine interaction, speech and voice production modelling, analysis of the physiological and neurophysiological mechanisms of speech production and perception, speech recognition, language analysis, and many more applications.

The aim of this Special Issue is to present the recent advancements in the field of image and signal processing for speech analysis. In particular, the Special Issue will report on various types of processing techniques and applications, which, in conjunction with machine and deep learning, present solutions to real-world problems related to speech and communication. Authors are encouraged to submit manuscripts for publications on (but not limited to) the following areas: 

  • Artificial intelligence for speech and voice analysis;
  • Automatic techniques for the assessment of neurological disorders;
  • Diagnosis of speech and voice disorders;
  • Automatic techniques for hearing disorders;
  • Automatic analysis of new-born cry;
  • Social and behavioural applications;
  • Natural language processing;
  • Speech production and synthesis;
  • Speech perception modelling;
  • Speaker verification and identification;
  • Applications of commercially available speech technologies;
  • Dialect and accent recognition;
  • Speech analysis for human–computer interaction;
  • Imaging and neuro-imaging techniques for speech analysis;
  • Emerging techniques and hardware for speech and voice analysis;
  • Audio and video datasets.

Submitted articles should not have been previously published or be currently under review by other journals or conferences. If any portions of the submitted manuscript have previously appeared or will appear in a conference proceeding, authors should declare this at the time of submission in a separate letter to the guest editors. Authors must also provide a copy of the previous conference publication and indicate how the journal version of their paper has been extended to assist the guest editors and reviewers with differentiation between the manuscripts. Moreover, authors must resolve any potential copyright issues prior to submission.

We look forward to receiving your exciting papers!

Dr. Silvia Orlandi
Dr. Andrea Bandini
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Speech analysis
  • Voice analysis
  • Speech and hearing disorders
  • Cry analysis
  • Speech recognition
  • Natural language processing
  • Neurological disorders
  • Speech production and perception
  • Speaker identification
  • Speech technologies
  • Datasets

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 513 KiB  
Article
Assessing Cognitive Workload Using Cardiovascular Measures and Voice
by Eydis H. Magnusdottir, Kamilla R. Johannsdottir, Arnab Majumdar and Jon Gudnason
Sensors 2022, 22(18), 6894; https://doi.org/10.3390/s22186894 - 13 Sep 2022
Cited by 4 | Viewed by 1473
Abstract
Monitoring cognitive workload has the potential to improve both the performance and fidelity of human decision making. However, previous efforts towards discriminating further than binary levels (e.g., low/high or neutral/high) in cognitive workload classification have not been successful. This lack of sensitivity in [...] Read more.
Monitoring cognitive workload has the potential to improve both the performance and fidelity of human decision making. However, previous efforts towards discriminating further than binary levels (e.g., low/high or neutral/high) in cognitive workload classification have not been successful. This lack of sensitivity in cognitive workload measurements might be due to individual differences as well as inadequate methodology used to analyse the measured signal. In this paper, a method that combines the speech signal with cardiovascular measurements for screen and heartbeat classification is introduced. For validation, speech and cardiovascular signals from 97 university participants and 20 airline pilot participants were collected while cognitive stimuli of varying difficulty level were induced with the Stroop colour/word test. For the trinary classification scheme (low, medium, high cognitive workload) the prominent result using classifiers trained on each participant achieved 15.17 ± 0.79% and 17.38 ± 1.85% average misclassification rates indicating good discrimination at three levels of cognitive workload. Combining cardiovascular and speech measures synchronized to each heartbeat and consolidated with short-term dynamic measures might therefore provide enhanced sensitivity in cognitive workload monitoring. The results show that the influence of individual differences is a limiting factor for a generic classification and highlights the need for research to focus on methods that incorporate individual differences to achieve even better results. This method can potentially be used to measure and monitor workload in real time in operational environments. Full article
(This article belongs to the Special Issue Biomedical Signal and Image Processing in Speech Analysis)
Show Figures

Figure 1

11 pages, 2280 KiB  
Article
Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network
by Jeong Hoon Lee, Chang Yoon Lee, Jin Seop Eom, Mingun Pak, Hee Seok Jeong and Hee Young Son
Sensors 2022, 22(17), 6387; https://doi.org/10.3390/s22176387 - 24 Aug 2022
Cited by 6 | Viewed by 1573
Abstract
Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient’s voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected [...] Read more.
Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient’s voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected voice and the GRBAS score from 114 patients undergoing surgery with thyroid cancer. The data for each patient were taken from three points in time: preoperative, and 2 weeks and 3 months postoperative. Using the pretrained model to predict GRBAS as the backbone, the preoperative and 2-weeks-postoperative voice spectrogram were trained for the EfficientNet architecture deep-learning model with long short-term memory (LSTM) to predict the voice at 3 months postoperation. The correlation analysis of the predicted results for the grade, breathiness, and asthenia scores were 0.741, 0.766, and 0.433, respectively. Based on the scaled prediction results, the area under the receiver operating characteristic curve for the binarized grade, breathiness, and asthenia were 0.894, 0.918, and 0.735, respectively. In the follow-up test results for 12 patients after 6 months, the average of the AUC values for the five scores was 0.822. This study showed the feasibility of predicting vocal recovery after 3 months using the spectrogram. We expect this model could be used to relieve patients’ psychological anxiety and encourage them to actively participate in speech rehabilitation. Full article
(This article belongs to the Special Issue Biomedical Signal and Image Processing in Speech Analysis)
Show Figures

Figure 1

16 pages, 5382 KiB  
Article
Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition
by Izabela Świetlicka, Wiesława Kuniszyk-Jóźkowiak and Michał Świetlicki
Sensors 2022, 22(1), 321; https://doi.org/10.3390/s22010321 - 1 Jan 2022
Cited by 12 | Viewed by 2423
Abstract
The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with [...] Read more.
The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with plosives, syllable repetitions, and sound-initial prolongations—was transformed using principal component analysis. The result was a model containing four principal components describing analysed utterances. Distances between standardised original variables and elements of the observation matrix in a new system of coordinates were calculated and then applied in the recognition process. As a classifying algorithm, the multilayer perceptron network was used. Achieved results were compared with outcomes from previous experiments where speech samples were parameterised with the Kohonen network application. The classifying network achieved overall accuracy at 76% (from 50% to 91%, depending on the dysfluency type). Full article
(This article belongs to the Special Issue Biomedical Signal and Image Processing in Speech Analysis)
Show Figures

Figure 1

30 pages, 10696 KiB  
Article
A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation
by Mohammad Al-Qaderi, Elfituri Lahamer and Ahmad Rad
Sensors 2021, 21(15), 5097; https://doi.org/10.3390/s21155097 - 28 Jul 2021
Cited by 10 | Viewed by 2398
Abstract
We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances [...] Read more.
We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model—support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker’s gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker’s gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans. Full article
(This article belongs to the Special Issue Biomedical Signal and Image Processing in Speech Analysis)
Show Figures

Figure 1

Back to TopTop