Computational Methods and Engineering Solutions to Voice III

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 September 2023) | Viewed by 10122

Special Issue Editor


E-Mail Website
Guest Editor
Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
Interests: FSAI in voice production; clinical transfer of new analysis methods and technologies
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Today, voice and speech research is not limited to acoustic, medical, and clinical studies and investigations. Approaches from different fields such as mathematics, computer science, artificial intelligence, fluid dynamics, mechatronics, and biology are applied to achieve new insights into and a better understanding of the physiological and pathological laryngeal processes within voice and speech production. Based on fruitful interdisciplinary working research groups, many new approaches have been suggested during the last decade. This includes, for example, highly advanced numerical models (FEM/FVM models), as well as tissue engineering and machine-learning-based data analysis approaches. The purpose of this Special Issue is to provide an overview of the newest and most innovative techniques applied in our field. Young colleagues are especially encouraged to submit their work. Authors are invited to submit their work related to the following topics, applying mathematical, engineering, computer science, and biological methods, as well as new hardware developments within the field of voice and speech production.

Prof. Dr. Michael Döllinger
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational modeling
  • experimental modeling
  • computational fluid dynamics
  • fluid–structure–acoustic interactions
  • image processing
  • advanced data analysis
  • machine learning
  • new technologies
  • tissue engineering
  • molecular biology
  • acoustic analysis
  • new measurement hardware

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 2016 KiB  
Article
Assessing Gender Bias in Auditory-Perceptual Ratings of Tracheoesophageal Speakers
by Jenna L. Bucci, Nedeljko Jovanovic and Philip C. Doyle
Appl. Sci. 2024, 14(8), 3447; https://doi.org/10.3390/app14083447 - 19 Apr 2024
Viewed by 184
Abstract
Objective: This study examined the relationship between gender and auditory-perceptual evaluation of tracheoesophageal (TE) speech. Method: We collected auditory-perceptual judgments of two features, speech acceptability and listener comfort, from normal-hearing young adult listeners (n = 16) who were naïve to TE [...] Read more.
Objective: This study examined the relationship between gender and auditory-perceptual evaluation of tracheoesophageal (TE) speech. Method: We collected auditory-perceptual judgments of two features, speech acceptability and listener comfort, from normal-hearing young adult listeners (n = 16) who were naïve to TE speech. Auditory-perceptual judgments were made for 12 TE speakers (6 men and 6 women) on two occasions separated by between 7 and 14 days. During the first session, listeners were deceived about the gender of the voice samples presented, and in the second session, listeners were informed of the true gender of the voice samples. Results: The findings suggest that a gender bias exists in perceptions of TE speech, and that female TE speakers tend to be disproportionately penalized when compared to their male counterparts when gender is known. Conclusions: These data provide insights into the potential influence of speaker gender on listener judgments of TE speech and the impact that such factors may have on communication. Our data indicate that listeners rate female TE speaker samples as less acceptable and less comfortable to listen to when the samples are known to be female speakers. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

18 pages, 4569 KiB  
Article
Deep Learning for Neuromuscular Control of Vocal Source for Voice Production
by Anil Palaparthi, Rishi K. Alluri and Ingo R. Titze
Appl. Sci. 2024, 14(2), 769; https://doi.org/10.3390/app14020769 - 16 Jan 2024
Viewed by 668
Abstract
A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used [...] Read more.
A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used as the physical plant. In the LeTalker, a three-mass vocal fold model was used to simulate self-sustained vocal fold oscillation. A constant /ə/ vowel was used for the vocal tract shape. The trachea was modeled after MRI measurements. The neuromuscular control system generates control parameters to achieve four acoustic targets (fundamental frequency, sound pressure level, normalized spectral centroid, and signal-to-noise ratio) and four somatosensory targets (vocal fold length, and longitudinal fiber stress in the three vocal fold layers). The deep-learning-based control system comprises one acoustic feedforward controller and two feedback (acoustic and somatosensory) controllers. Fifty thousand steady speech signals were generated using the LeTalker for training the control system. The results demonstrated that the control system was able to generate the lung pressure and the three muscle activations such that the four acoustic and four somatosensory targets were reached with high accuracy. After training, the motor command corrections from the feedback controllers were minimal compared to the feedforward controller except for thyroarytenoid muscle activation. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

20 pages, 5457 KiB  
Article
Effect of Subglottic Stenosis on Expiratory Sound Using Direct Noise Calculation
by Biao Geng, Qian Xue, Scott Thomson and Xudong Zheng
Appl. Sci. 2023, 13(24), 13197; https://doi.org/10.3390/app132413197 - 12 Dec 2023
Viewed by 642
Abstract
Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct [...] Read more.
Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct noise calculation to simulate sound production in a simplified axisymmetric configuration that included the trachea, the vocal folds, the supraglottal tract, and an open environmental space. This study focused on flow-sustained tones and explored the impact of various parameters, such as the SGS severity, the SGS distance, the flowrate, and the glottal opening size. It was found that the sound pressure level (SPL) of the expiratory sound increased with flowrate. SGS had little effect on the sound until its severity approached 75% and SPL increased rapidly as the severity approached 100%. The results also revealed that the tonal components of the sound predominantly came from hole tones and tract harmonics and their coupling. The spectra of the sound were greatly influenced by constricting the glottis, which suggests that respiratory tasks that involve maneuvers to change the glottal opening size could be useful in gathering more information on respiratory sound to aid in the diagnosis of subglottic stenosis. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

17 pages, 4265 KiB  
Article
Examining the Quasi-Steady Airflow Assumption in Irregular Vocal Fold Vibration
by Xiaojian Wang, Xudong Zheng, Ingo R. Titze, Anil Palaparthi and Qian Xue
Appl. Sci. 2023, 13(23), 12691; https://doi.org/10.3390/app132312691 - 27 Nov 2023
Viewed by 542
Abstract
The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of [...] Read more.
The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of QSFA in human phonation using numerical methods by considering factors of phonation frequency, air inertance in the vocal tract, and irregular glottal shapes. Two sets of irregular glottal shapes were examined through dynamic, pseudo-static, and quasi-steady simulations. The differences between dynamic and quasi-steady/pseudo-static simulations were measured for glottal flow rate, glottal wall pressure, and sound spectrum to evaluate the validity of QSFA. The results show that errors in glottal flow rate and wall pressure predicted by QSFA were small at 100 Hz but significant at 500 Hz due to growing flow unsteadiness. Air inertia in the vocal tract worsened predictions when interacting with unsteady glottal flow. Flow unsteadiness also influenced the harmonic energy ratio, which is perceptually important. The effects of glottal shape and glottal wall motion on the validity of QSFA were found to be insignificant. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

18 pages, 603 KiB  
Article
The Influence of Stimulus Composition and Scoring Method on Objective Listener Assessments of Tracheoesophageal Speech Accuracy
by Philip C. Doyle, Natasha Goncharenko and Jeff Searl
Appl. Sci. 2023, 13(17), 9701; https://doi.org/10.3390/app13179701 - 28 Aug 2023
Viewed by 500
Abstract
Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the [...] Read more.
Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the three intelligibility word lists, including the (1) Consonant Rhyme Test, (2) Northwestern Intelligibility Test, and (3) the Weiss and Basili list. Methodology: Fifteen normal-hearing young adults served as listeners; all listeners were trained in phonetic transcription (IPA), but none had previous exposure to any mode of postlaryngectomy alaryngeal speech. Speaker stimuli were presented to all listeners through headphones, and all stimuli were transcribed phonetically using an open-set response paradigm. Data were analyzed for individual speakers by stimulus list. Phonemic scoring was compared to a whole-word scoring method, and the types of errors observed were quantified by word list. Results: Individual speaker variability was noted, and its effect on the assessment of speech accuracy was identified. The phonemic scoring method was found to be a more sensitive measure of TE speech accuracy. The W&B list yielded the lowest accuracy scores of the three lists. This finding may indicate its increased sensitivity and potential clinical value. Conclusions: Overall, this study supports the use of open-set, phonemic scoring methods when evaluating TE speaker intelligibility. Future research should aim to assess the specificity of assessment tools on a larger sample of TE speakers who vary in their speech proficiency. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

15 pages, 1856 KiB  
Article
Interaction of Voice Onset Time with Vocal Hyperfunction and Voice Quality
by Maria Francisca de Paula Soares, Marília Sampaio and Meike Brockmann-Bauser
Appl. Sci. 2023, 13(15), 8956; https://doi.org/10.3390/app13158956 - 04 Aug 2023
Viewed by 1130
Abstract
The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) [...] Read more.
The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) and 11 men (X = 47.5 ± 11.0 years), who had either phonotraumatic vocal hyperfunction (PVH) and non-phonotraumatic vocal hyperfunction (NPVH). VQ was judged considering the overall severity of dysphonia (OS) and the subcharacteristics of roughness, breathiness, and strain. Phonetic variables such as vowel stress, syllable stress, and mode of speech task were analyzed. Four samples of syllables with [p] plus vowel or diphthong were retrieved from CAPE-V sentences recordings. Acoustic analysis with Praat comprised VOT, mean fundamental frequency (fo), intensity (SPL dB(A)), and coefficient of variation of fundamental frequency (CV_fo %). VOT was significantly influenced by OS (p ≤ 0.001) but not by vocal VH condition (PVH versus NPVH) (p = 0.90). However, CV_fo was affected by the VH condition (p = 0.02). Gender effects were only found for mean fo (p ≤ 0.001) and SPL (p = 0.01). All VQ sub characteristics (OS, roughness, breathiness, and strain) correlated with VOT (p ≤ 0.001) and SPL (p ≤ 0.001) but not with fo. In summary, VOT was affected by voice quality, while it was not affected by vocal hyperfunction conditions. Therefore, VOT has the potential to objectively describe the onset of voicing in voice diagnostics, and may be one underlying objective characteristic of perceptual vocal quality. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

18 pages, 2580 KiB  
Article
Alzheimer’s Dementia Speech (Audio vs. Text): Multi-Modal Machine Learning at High vs. Low Resolution
by Prachee Priyadarshinee, Christopher Johann Clarke, Jan Melechovsky, Cindy Ming Ying Lin, Balamurali B. T. and Jer-Ming Chen
Appl. Sci. 2023, 13(7), 4244; https://doi.org/10.3390/app13074244 - 27 Mar 2023
Cited by 2 | Viewed by 2513
Abstract
Automated techniques to detect Alzheimer’s Dementia through the use of audio recordings of spontaneous speech are now available with varying degrees of reliability. Here, we present a systematic comparison across different modalities, granularities and machine learning models to guide in choosing the most [...] Read more.
Automated techniques to detect Alzheimer’s Dementia through the use of audio recordings of spontaneous speech are now available with varying degrees of reliability. Here, we present a systematic comparison across different modalities, granularities and machine learning models to guide in choosing the most effective tools. Specifically, we present a multi-modal approach (audio and text) for the automatic detection of Alzheimer’s Dementia from recordings of spontaneous speech. Sixteen features, including four feature extraction methods (Energy–Time plots, Keg of Text Analytics, Keg of Text Analytics-Extended and Speech to Silence ratio) not previously applied in this context were tested to determine their relative performance. These features encompass two modalities (audio vs. text) at two resolution scales (frame-level vs. file-level). We compared the accuracy resulting from these features and found that text-based classification outperformed audio-based classification with the best performance attaining 88.7%, surpassing other reports to-date relying on the same dataset. For text-based classification in particular, the best file-level feature performed 9.8% better than the frame-level feature. However, when comparing audio-based classification, the best frame-level feature performed 1.4% better than the best file-level feature. This multi-modal multi-model comparison at high- and low-resolution offers insights into which approach is most efficacious, depending on the sampling context. Such a comparison of the accuracy of Alzheimer’s Dementia classification using both frame-level and file-level granularities on audio and text modalities of different machine learning models on the same dataset has not been previously addressed. We also demonstrate that the subject’s speech captured in short time frames and their dynamics may contain enough inherent information to indicate the presence of dementia. Overall, such a systematic analysis facilitates the identification of Alzheimer’s Dementia quickly and non-invasively, potentially leading to more timely interventions and improved patient outcomes. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

19 pages, 3880 KiB  
Article
Auditory Perception of Impulsiveness and Tonality in Vocal Fry
by Vinod Devaraj, Imme Roesner, Florian Wendt, Jean Schoentgen and Philipp Aichinger
Appl. Sci. 2023, 13(7), 4186; https://doi.org/10.3390/app13074186 - 25 Mar 2023
Cited by 2 | Viewed by 916
Abstract
Vocal fry is a voice quality that occurs in a healthy voice, but it can also be a sign of a voice disorder. In this study, we investigated the relationship between the parameters of voice production, a dedicated psychoacoustic feature, and the perceptual [...] Read more.
Vocal fry is a voice quality that occurs in a healthy voice, but it can also be a sign of a voice disorder. In this study, we investigated the relationship between the parameters of voice production, a dedicated psychoacoustic feature, and the perceptual aspects of vocal fry. Two perceptual experiments were carried out to determine whether the fundamental frequency, the open quotient, and the glottal area pulse skewness affect the perception of vocal fry in synthetic vowels. Thirteen listeners participated in the perceptual experiments to assess the following attributes: binary fry (yes/no) and impulsiveness, tonality, and naturalness (7-point Likert scales). The results suggest that the perception of vocal fry is mainly triggered by a low fundamental frequency, but the open quotient also plays a role, with narrower glottal area pulses slightly increasing the probability of perceived fry. Perceived tonality is inversely related to perceived impulsiveness. Internal reference standards of listeners appear to have fixed elements but may also be affected by anchoring and the short-term (i.e., within-vowel) context of the stimuli. In addition, the prominence of the peaks observed in the loudness curve over time appears to be related to graduations of fry. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

18 pages, 1256 KiB  
Article
Acoustic Voice and Speech Biomarkers of Treatment Status during Hospitalization for Acute Decompensated Heart Failure
by Olivia M. Murton, G. William Dec, Robert E. Hillman, Maulik D. Majmudar, Johannes Steiner, John V. Guttag and Daryush D. Mehta
Appl. Sci. 2023, 13(3), 1827; https://doi.org/10.3390/app13031827 - 31 Jan 2023
Cited by 1 | Viewed by 2376
Abstract
This study investigates acoustic voice and speech features as biomarkers for acute decompensated heart failure (ADHF), a serious escalation of heart failure symptoms including breathlessness and fatigue. ADHF-related systemic fluid accumulation in the lungs and laryngeal tissues is hypothesized to affect phonation and [...] Read more.
This study investigates acoustic voice and speech features as biomarkers for acute decompensated heart failure (ADHF), a serious escalation of heart failure symptoms including breathlessness and fatigue. ADHF-related systemic fluid accumulation in the lungs and laryngeal tissues is hypothesized to affect phonation and respiration for speech. A set of daily spoken recordings from 52 patients undergoing inpatient ADHF treatment was analyzed to identify voice and speech biomarkers for ADHF and to examine the trajectory of biomarkers during treatment. Results indicated that speakers produce more stable phonation, a more creaky voice, faster speech rates, and longer phrases after ADHF treatment compared to their pre-treatment voices. This project builds on work to develop a method of monitoring ADHF using speech biomarkers and presents a more detailed understanding of relevant voice and speech features. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

Back to TopTop