Research

15 pages, 2016 KiB

Open AccessArticle

Assessing Gender Bias in Auditory-Perceptual Ratings of Tracheoesophageal Speakers

by Jenna L. Bucci, Nedeljko Jovanovic and Philip C. Doyle

Appl. Sci. 2024, 14(8), 3447; https://doi.org/10.3390/app14083447 - 19 Apr 2024

Viewed by 184

Objective: This study examined the relationship between gender and auditory-perceptual evaluation of tracheoesophageal (TE) speech. Method: We collected auditory-perceptual judgments of two features, speech acceptability and listener comfort, from normal-hearing young adult listeners (n = 16) who were naïve to TE [...] Read more.

Objective: This study examined the relationship between gender and auditory-perceptual evaluation of tracheoesophageal (TE) speech. Method: We collected auditory-perceptual judgments of two features, speech acceptability and listener comfort, from normal-hearing young adult listeners (n = 16) who were naïve to TE speech. Auditory-perceptual judgments were made for 12 TE speakers (6 men and 6 women) on two occasions separated by between 7 and 14 days. During the first session, listeners were deceived about the gender of the voice samples presented, and in the second session, listeners were informed of the true gender of the voice samples. Results: The findings suggest that a gender bias exists in perceptions of TE speech, and that female TE speakers tend to be disproportionately penalized when compared to their male counterparts when gender is known. Conclusions: These data provide insights into the potential influence of speaker gender on listener judgments of TE speech and the impact that such factors may have on communication. Our data indicate that listeners rate female TE speaker samples as less acceptable and less comfortable to listen to when the samples are known to be female speakers. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

18 pages, 4569 KiB

Open AccessArticle

Deep Learning for Neuromuscular Control of Vocal Source for Voice Production

by Anil Palaparthi, Rishi K. Alluri and Ingo R. Titze

Appl. Sci. 2024, 14(2), 769; https://doi.org/10.3390/app14020769 - 16 Jan 2024

Viewed by 668

Abstract

A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used [...] Read more.

A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used as the physical plant. In the LeTalker, a three-mass vocal fold model was used to simulate self-sustained vocal fold oscillation. A constant /ə/ vowel was used for the vocal tract shape. The trachea was modeled after MRI measurements. The neuromuscular control system generates control parameters to achieve four acoustic targets (fundamental frequency, sound pressure level, normalized spectral centroid, and signal-to-noise ratio) and four somatosensory targets (vocal fold length, and longitudinal fiber stress in the three vocal fold layers). The deep-learning-based control system comprises one acoustic feedforward controller and two feedback (acoustic and somatosensory) controllers. Fifty thousand steady speech signals were generated using the LeTalker for training the control system. The results demonstrated that the control system was able to generate the lung pressure and the three muscle activations such that the four acoustic and four somatosensory targets were reached with high accuracy. After training, the motor command corrections from the feedback controllers were minimal compared to the feedforward controller except for thyroarytenoid muscle activation. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

20 pages, 5457 KiB

Open AccessArticle

Effect of Subglottic Stenosis on Expiratory Sound Using Direct Noise Calculation

by Biao Geng, Qian Xue, Scott Thomson and Xudong Zheng

Appl. Sci. 2023, 13(24), 13197; https://doi.org/10.3390/app132413197 - 12 Dec 2023

Viewed by 642

Abstract

Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct [...] Read more.

Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct noise calculation to simulate sound production in a simplified axisymmetric configuration that included the trachea, the vocal folds, the supraglottal tract, and an open environmental space. This study focused on flow-sustained tones and explored the impact of various parameters, such as the SGS severity, the SGS distance, the flowrate, and the glottal opening size. It was found that the sound pressure level (SPL) of the expiratory sound increased with flowrate. SGS had little effect on the sound until its severity approached 75% and SPL increased rapidly as the severity approached 100%. The results also revealed that the tonal components of the sound predominantly came from hole tones and tract harmonics and their coupling. The spectra of the sound were greatly influenced by constricting the glottis, which suggests that respiratory tasks that involve maneuvers to change the glottal opening size could be useful in gathering more information on respiratory sound to aid in the diagnosis of subglottic stenosis. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

17 pages, 4265 KiB

Open AccessArticle

Examining the Quasi-Steady Airflow Assumption in Irregular Vocal Fold Vibration

by Xiaojian Wang, Xudong Zheng, Ingo R. Titze, Anil Palaparthi and Qian Xue

Appl. Sci. 2023, 13(23), 12691; https://doi.org/10.3390/app132312691 - 27 Nov 2023

Viewed by 542

Abstract

The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of [...] Read more.

The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of QSFA in human phonation using numerical methods by considering factors of phonation frequency, air inertance in the vocal tract, and irregular glottal shapes. Two sets of irregular glottal shapes were examined through dynamic, pseudo-static, and quasi-steady simulations. The differences between dynamic and quasi-steady/pseudo-static simulations were measured for glottal flow rate, glottal wall pressure, and sound spectrum to evaluate the validity of QSFA. The results show that errors in glottal flow rate and wall pressure predicted by QSFA were small at 100 Hz but significant at 500 Hz due to growing flow unsteadiness. Air inertia in the vocal tract worsened predictions when interacting with unsteady glottal flow. Flow unsteadiness also influenced the harmonic energy ratio, which is perceptually important. The effects of glottal shape and glottal wall motion on the validity of QSFA were found to be insignificant. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

18 pages, 603 KiB

Open AccessFeature PaperArticle

The Influence of Stimulus Composition and Scoring Method on Objective Listener Assessments of Tracheoesophageal Speech Accuracy

by Philip C. Doyle, Natasha Goncharenko and Jeff Searl

Appl. Sci. 2023, 13(17), 9701; https://doi.org/10.3390/app13179701 - 28 Aug 2023

Viewed by 500

Abstract

Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the [...] Read more.

Introduction: This study investigated the influence of stimulus composition for three speech intelligibility word lists and two scoring methods on the speech accuracy judgments of five tracheoesophageal (TE) speakers. This was achieved through phonemic comparisons across TE speakers’ productions of stimuli from the three intelligibility word lists, including the (1) Consonant Rhyme Test, (2) Northwestern Intelligibility Test, and (3) the Weiss and Basili list. Methodology: Fifteen normal-hearing young adults served as listeners; all listeners were trained in phonetic transcription (IPA), but none had previous exposure to any mode of postlaryngectomy alaryngeal speech. Speaker stimuli were presented to all listeners through headphones, and all stimuli were transcribed phonetically using an open-set response paradigm. Data were analyzed for individual speakers by stimulus list. Phonemic scoring was compared to a whole-word scoring method, and the types of errors observed were quantified by word list. Results: Individual speaker variability was noted, and its effect on the assessment of speech accuracy was identified. The phonemic scoring method was found to be a more sensitive measure of TE speech accuracy. The W&B list yielded the lowest accuracy scores of the three lists. This finding may indicate its increased sensitivity and potential clinical value. Conclusions: Overall, this study supports the use of open-set, phonemic scoring methods when evaluating TE speaker intelligibility. Future research should aim to assess the specificity of assessment tools on a larger sample of TE speakers who vary in their speech proficiency. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

15 pages, 1856 KiB

Open AccessArticle

Interaction of Voice Onset Time with Vocal Hyperfunction and Voice Quality

by Maria Francisca de Paula Soares, Marília Sampaio and Meike Brockmann-Bauser

Appl. Sci. 2023, 13(15), 8956; https://doi.org/10.3390/app13158956 - 04 Aug 2023

Viewed by 1130

Abstract

The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) [...] Read more.

The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) and 11 men (X = 47.5 ± 11.0 years), who had either phonotraumatic vocal hyperfunction (PVH) and non-phonotraumatic vocal hyperfunction (NPVH). VQ was judged considering the overall severity of dysphonia (OS) and the subcharacteristics of roughness, breathiness, and strain. Phonetic variables such as vowel stress, syllable stress, and mode of speech task were analyzed. Four samples of syllables with [p] plus vowel or diphthong were retrieved from CAPE-V sentences recordings. Acoustic analysis with Praat comprised VOT, mean fundamental frequency (fo), intensity (SPL dB(A)), and coefficient of variation of fundamental frequency (CV_fo %). VOT was significantly influenced by OS (p ≤ 0.001) but not by vocal VH condition (PVH versus NPVH) (p = 0.90). However, CV_fo was affected by the VH condition (p = 0.02). Gender effects were only found for mean fo (p ≤ 0.001) and SPL (p = 0.01). All VQ sub characteristics (OS, roughness, breathiness, and strain) correlated with VOT (p ≤ 0.001) and SPL (p ≤ 0.001) but not with fo. In summary, VOT was affected by voice quality, while it was not affected by vocal hyperfunction conditions. Therefore, VOT has the potential to objectively describe the onset of voicing in voice diagnostics, and may be one underlying objective characteristic of perceptual vocal quality. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

18 pages, 2580 KiB

Open AccessArticle

Alzheimer’s Dementia Speech (Audio vs. Text): Multi-Modal Machine Learning at High vs. Low Resolution

by Prachee Priyadarshinee, Christopher Johann Clarke, Jan Melechovsky, Cindy Ming Ying Lin, Balamurali B. T. and Jer-Ming Chen

Appl. Sci. 2023, 13(7), 4244; https://doi.org/10.3390/app13074244 - 27 Mar 2023

Cited by 2 | Viewed by 2513

Abstract

Automated techniques to detect Alzheimer’s Dementia through the use of audio recordings of spontaneous speech are now available with varying degrees of reliability. Here, we present a systematic comparison across different modalities, granularities and machine learning models to guide in choosing the most [...] Read more.

Automated techniques to detect Alzheimer’s Dementia through the use of audio recordings of spontaneous speech are now available with varying degrees of reliability. Here, we present a systematic comparison across different modalities, granularities and machine learning models to guide in choosing the most effective tools. Specifically, we present a multi-modal approach (audio and text) for the automatic detection of Alzheimer’s Dementia from recordings of spontaneous speech. Sixteen features, including four feature extraction methods (Energy–Time plots, Keg of Text Analytics, Keg of Text Analytics-Extended and Speech to Silence ratio) not previously applied in this context were tested to determine their relative performance. These features encompass two modalities (audio vs. text) at two resolution scales (frame-level vs. file-level). We compared the accuracy resulting from these features and found that text-based classification outperformed audio-based classification with the best performance attaining 88.7%, surpassing other reports to-date relying on the same dataset. For text-based classification in particular, the best file-level feature performed 9.8% better than the frame-level feature. However, when comparing audio-based classification, the best frame-level feature performed 1.4% better than the best file-level feature. This multi-modal multi-model comparison at high- and low-resolution offers insights into which approach is most efficacious, depending on the sampling context. Such a comparison of the accuracy of Alzheimer’s Dementia classification using both frame-level and file-level granularities on audio and text modalities of different machine learning models on the same dataset has not been previously addressed. We also demonstrate that the subject’s speech captured in short time frames and their dynamics may contain enough inherent information to indicate the presence of dementia. Overall, such a systematic analysis facilitates the identification of Alzheimer’s Dementia quickly and non-invasively, potentially leading to more timely interventions and improved patient outcomes. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

19 pages, 3880 KiB

Open AccessArticle

Auditory Perception of Impulsiveness and Tonality in Vocal Fry

by Vinod Devaraj, Imme Roesner, Florian Wendt, Jean Schoentgen and Philipp Aichinger

Appl. Sci. 2023, 13(7), 4186; https://doi.org/10.3390/app13074186 - 25 Mar 2023

Cited by 2 | Viewed by 916

Abstract

Vocal fry is a voice quality that occurs in a healthy voice, but it can also be a sign of a voice disorder. In this study, we investigated the relationship between the parameters of voice production, a dedicated psychoacoustic feature, and the perceptual [...] Read more.

Vocal fry is a voice quality that occurs in a healthy voice, but it can also be a sign of a voice disorder. In this study, we investigated the relationship between the parameters of voice production, a dedicated psychoacoustic feature, and the perceptual aspects of vocal fry. Two perceptual experiments were carried out to determine whether the fundamental frequency, the open quotient, and the glottal area pulse skewness affect the perception of vocal fry in synthetic vowels. Thirteen listeners participated in the perceptual experiments to assess the following attributes: binary fry (yes/no) and impulsiveness, tonality, and naturalness (7-point Likert scales). The results suggest that the perception of vocal fry is mainly triggered by a low fundamental frequency, but the open quotient also plays a role, with narrower glottal area pulses slightly increasing the probability of perceived fry. Perceived tonality is inversely related to perceived impulsiveness. Internal reference standards of listeners appear to have fixed elements but may also be affected by anchoring and the short-term (i.e., within-vowel) context of the stimuli. In addition, the prominence of the peaks observed in the loudness curve over time appears to be related to graduations of fry. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

18 pages, 1256 KiB

Open AccessFeature PaperArticle

Acoustic Voice and Speech Biomarkers of Treatment Status during Hospitalization for Acute Decompensated Heart Failure

by Olivia M. Murton, G. William Dec, Robert E. Hillman, Maulik D. Majmudar, Johannes Steiner, John V. Guttag and Daryush D. Mehta

Appl. Sci. 2023, 13(3), 1827; https://doi.org/10.3390/app13031827 - 31 Jan 2023

Cited by 1 | Viewed by 2376

Abstract

This study investigates acoustic voice and speech features as biomarkers for acute decompensated heart failure (ADHF), a serious escalation of heart failure symptoms including breathlessness and fatigue. ADHF-related systemic fluid accumulation in the lungs and laryngeal tissues is hypothesized to affect phonation and [...] Read more.

This study investigates acoustic voice and speech features as biomarkers for acute decompensated heart failure (ADHF), a serious escalation of heart failure symptoms including breathlessness and fatigue. ADHF-related systemic fluid accumulation in the lungs and laryngeal tissues is hypothesized to affect phonation and respiration for speech. A set of daily spoken recordings from 52 patients undergoing inpatient ADHF treatment was analyzed to identify voice and speech biomarkers for ADHF and to examine the trajectory of biomarkers during treatment. Results indicated that speakers produce more stable phonation, a more creaky voice, faster speech rates, and longer phrases after ADHF treatment compared to their pre-treatment voices. This project builds on work to develop a method of monitoring ADHF using speech biomarkers and presents a more detailed understanding of relevant voice and speech features. Full article

(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Computational Methods and Engineering Solutions to Voice III

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI