Next Article in Journal
Removal of Riprap within Channelized Rivers: A Solution for the Restoration of Lateral Channel Dynamics and Bedload Replenishment?
Next Article in Special Issue
Special Issue on Current Trends and Future Directions in Voice Acoustics Measurement
Previous Article in Journal
Characterization and Differentiation of Wild and Cultivated Berries Based on Isotopic and Elemental Profiles
Previous Article in Special Issue
Determination of Harmonic Parameters in Pathological Voices—Efficient Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Laryngeal Imaging Study of Glottal Attack/Offset Time in Adductor Spasmodic Dysphonia during Connected Speech

by
Maryam Naghibolhosseini
1,*,
Stephanie R. C. Zacharias
2,3,
Sarah Zenas
4,
Farrah Levesque
5 and
Dimitar D. Deliyski
1
1
Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
2
Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ 85259, USA
3
Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, AZ 85054, USA
4
Lyman Briggs College, Michigan State University, East Lansing, MI 48825, USA
5
Warrington College of Business, University of Florida, Gainesville, FL 32611, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(5), 2979; https://doi.org/10.3390/app13052979
Submission received: 3 November 2022 / Revised: 14 February 2023 / Accepted: 21 February 2023 / Published: 25 February 2023
(This article belongs to the Special Issue Current Trends and Future Directions in Voice Acoustics Measurement)

Abstract

:

Featured Application

The potential application of this work is for assessment and diagnosis of adductor spasmodic dysphonia.

Abstract

Adductor spasmodic dysphonia (AdSD) disrupts laryngeal muscle control during speech and, therefore, affects the onset and offset of phonation. In this study, the goal is to use laryngeal high-speed videoendoscopy (HSV) to measure the glottal attack time (GAT) and glottal offset time (GOT) during connected speech for normophonic (vocally normal) and AdSD voices. A monochrome HSV system was used to record readings of six CAPE-V sentences and part of the “Rainbow Passage” from the participants. Three raters visually analyzed the HSV data using a playback software to measure the GAT and GOT. The results show that the GAT was greater in the AdSD group than in the normophonic group; however, the clinical significance of the amount of this difference needs to be studied further. More variability was observed in both GATs and GOTs of the disorder group. Additionally, the GAT and GOT time series were found to be nonstationary for the AdSD group while they were stationary for the normophonic voices. This study shows that the GAT and GOT measures can be potentially used as objective markers to characterize AdSD. The findings will potentially help in the development of standardized measures for voice evaluation and the accurate diagnosis of AdSD.

1. Introduction

The use of videostroboscopy in clinical voice assessment is limited in providing enough details about the vocal fold vibrations. This is due to the low frame rate of videostroboscopy, which can only produce an illusory slow-motion of a vibratory cycle [1]. The synchronization of the videostroboscopy to acoustic signals limits its use to capturing the vocal folds vibration only when the vibration is periodic [2,3]. Hence, these limitations become more prominent when studying aperiodic vocal fold vibrations, which is often the case in patients with voice disorders. The use of laryngeal high-speed videoendoscopy (HSV) overcomes the limitations of videostroboscopy with low temporal resolution [4,5]. HSV can provide large amounts of details on the intra-cycle vibratory characteristics of the vocal folds, and it can be used to study dysphonic voices [6,7,8]. Utilizing HSV in connected speech enables us to study voice disorders in a natural context, where voice disorders reveal themselves [9,10].
Adductor spasmodic dysphonia (AdSD) is a type of focal dystonia that affects the laryngeal muscles and interferes with vocal fold vibrations and voice production. Perceptually, AdSD results in a strained, broken, breathy, and strangled voice [11] and is commonly misdiagnosed with muscle tension dysphonia or vocal tremor [12]. This leads to delayed, ineffective, time-consuming, and expensive treatments, causing obvious patient frustration [12]. Case history, auditory-perceptual voice evaluation, acoustic analysis, and videostroboscopy are widely used for the diagnosis of AdSD and treatment efficacy assessment [13]. A three-tiered approach including a screening questionnaire, speech examination, and nasolaryngoscopy examination was proposed for AdSD diagnosis [14]. Although relatively useful [12,15,16], the diagnosis may still not be certain using these tools including the three-tiered approach [16]. Currently, there are no standardized diagnostic tests for AdSD, and the clinical diagnosis depends on the specialists in the voice care teams [14].
Current literature studying AdSD using HSV has focused on steady-state phonation [17,18]. Patel et al. noted micromotions and oscillatory breaks in AdSD patients when compared to muscle tension dysphonia [17], while Chen and colleagues studied the onset of voicing and found a subset of their AdSD patients presented with differences in the onset phonation gestures [18]. However, AdSD symptoms typically present during connected speech, not sustained phonation. Therefore, studying sustained phonation may limit task-specific characteristics of AdSD. We have previously developed several methods for automated analysis of HSV data during connected speech [9,10,19,20,21,22,23,24]; however, we still need to conduct manual analysis in order to build new measures and validate our automated methods.
The purpose of the current study is to expand on the above-mentioned studies by obtaining connected speech samples in patients with AdSD and those with no voice disorder (also known as normophonic) voices and quantitatively analyze the beginning and end of each phonation. Toward this goal, the glottal attack time (GAT) and glottal offset time (GOT) were measured for AdSD and normophonic voices in the connected speech samples. The GAT was defined as the time between the first oscillation and the first contact of the vocal folds at the phonation onset [25]. AdSD is characterized by difficulty at the onset of voicing [26]; hence, the GAT would be a suitable measure to capture the abrupt voice initiation [25]. The transitory phonatory phases at the onset and offset of phonation have clinical importance and have been studied during sustained vocalization [27]. The GAT measure was validated previously using electroglottography and acoustic data to characterize the onset of phonation [25]. The GOT was defined as the time difference between the last oscillation and the last contact of the vocal folds [28]. In this study, the GAT and GOT measurement for several participants was conducted by three raters to increase the measurement reliability. The measures were compared and the sources of discrepancies between the raters were determined by two independent evaluators. The inter-rater reliability was assessed and the rater with the most reliable measurements completed the GAT/GOT measurements for the additional participants. Parametric and non-parametric statistical analyses were conducted to compare the GAT and GOT of the normophonic and AdSD groups.

2. Materials and Methods

2.1. Data Collection

The data were obtained from five normophonic voices (three female and two male) and five with AdSD (four female and one male). HSV data were collected during the reading of the six CAPE-V sentences and the first six sentences of the Rainbow Passage. The HSV system consisted of a Photron FASTCAM mini AX200, M4 (32GB), monochrome high-speed camera (Photron Inc., San Diego, CA, USA), coupled with a flexible nasolaryngoscope. The recordings were conducted at a frame rate of 4000 frames per second (fps) with a spatial resolution of 256 × 224 pixels. HSV recordings of each participant included 200,000–400,000 frames in total and were stored as mraw files (Photron camera format) on an MSI GL63 8SE Laptop (the camera control system) for further analysis.

2.2. Participants

The AdSD participants were recruited among the treatment-seeking population at Mayo Clinic, Scottsdale, AZ, diagnosed with AdSD (mean age of 69.2, Table 1). During their initial visits, all participants were seen by both a speech-language pathologist (SLP) and a laryngologist. A voice evaluation and laryngoscopy were conducted during sustained phonation and connected speech, and stimulability tasks were completed. After the full voice evaluation, a diagnosis of AdSD was assigned via multidisciplinary consensus involving a fellowship-trained laryngologist and an SLP, specialized in voice disorders, based on the inclusion criteria (shown in Table 2). The participants with the AdSD diagnosis had no evidence of tremor via consensus between the treating SLP and laryngologist. Participants in the normophonic group were recruited among Mayo Clinic employees and by word of mouth (mean age 47.6, Table 1). All normophonic participants met the inclusion criteria in Table 2. All participants consented to the study. This study was approved by the Institutional Review Board at Mayo Clinic and Michigan State University. All study examinations were carried out by an SLP with 15 years of clinical voice experience.

2.3. Data Analysis

The timestamps for the first oscillation, first contact, last oscillation, and last contact of the vocal folds were determined in the HSV data by three raters. The GAT and GOT were then calculated based on the time difference between the oscillation and contact frames. The GATs and GOTs were measured during voice production by the true vocal folds. The oscillations of the false folds were only observed in one subject, i.e., S3M, and accordingly, the GAT and GOT for the false vocal folds were measured for this participant.
The raters visually analyzed the data using playback software (Phantom Camera Control, PCC) with a playback speed of 30 fps. They were allowed to adjust the playback speed as low as 1 fps to select the accurate frame numbers for the contact. The raters did not go below 15 fps for detecting the oscillation frame. The raters applied a Gaussian filter (with a kernel size of 3 × 3) and adjusted the gamma, gain, brightness, and contrast of the frames to facilitate the visual measurements. This filter was used to remove the pattern noise due to the camera lens and endoscope being slightly out of focus. The three raters completed their measurements for four participants and compared their measurements for every vocalization after completing the analysis. To do so, they met over video call and went over the vocalizations for which their measurements were more than 10 frames apart for the contacts and 20 frames apart for the oscillations. Accordingly, the measurements of GAT and GOT by each pair of raters were subtracted from each other (Rater 1-Rater 2, Rater 1-Rater 3, and Rater 2-Rater 3), and the Wald 99% confidence interval was calculated for inter-rater reliability assessment. The raters watched the videos together and came to a consensus about their measurements. Based on this consensus meeting, Rater 3′s measurements were found to be the most reliable measures. Rater 3 improved the accuracy of her measurement based on the consensus meeting and completed the data analysis for the remaining five participants. In this study, Rater 3′s measurements for all participants were used for the data analysis. Additionally, two independent evaluators compared the measurements completed by the three raters to assess the similarity/dissimilarity of measurements among raters and determined the sources of discrepancies. The three Raters reviewed more than 2.6 million HSV images (frames) for the data analysis. Participant N1 was excluded from the analysis due to extremely dark HSV data.
In addition to the descriptive statistics, parametric (ANOVA) and non-parametric tests (Kruskal–Wallis) were used to compare the GAT/GOT between the normophonic and AdSD groups. Moreover, a generalized linear model with repeated observations was used based on the individual measures from the participants. For a regular general linear model (GLM or ANOVA), the estimations are carried out with the covariance matrix forced to sphericity (homoskedasticity); however, this assumption is violated when considering the individual data in this study. Therefore, in this study, rather than using the GLM, the models were built using the PROC MIXED (SAS 9.4) procedure with repeated observations. Instead of the least squares solution, the generalized linear model used a maximum likelihood solution without the sphericity assumption (i.e., the Restricted Maximum Likelihood, REML, solution). In the repeated measures model, additional variables were introduced including gender, age, order of measurements, and interaction terms. At last, the dynamics of the GAT and GOT were evaluated using a time-series analysis.

3. Results

The three raters’ measurements for the first and last oscillation, also the first and last contact were within the limits of +/− 18 frames based on the Wald 99% confidence interval. The following is based on what the raters and evaluators concluded as the reasons for their agreement/disagreement with the measurements. More agreement was observed in the measurement of the contact frame number than in the oscillation frame number. More discrepancies in the measurement of the oscillation frame number were found at the offsets of vocalization than at the onsets. The raters were found to be more in agreement when they went further in the analysis of one participant’s data. Generally, more agreement was observed for the measurements of normophonic voices. One of the challenges in the measurement of the disordered voices was determining whether an onset belonged to a new phonation or that there was a delayed onset due to a phonatory break during the same phonation. In the measurements by Rater 1, in several instances, a continuation of phonation after an obstruction of the vocal folds was considered as a new phonation by mistake, which was more apparent for the disordered participants. The partial obstructions of the view of the vocal folds during phonation were found to be the main source of missing measurements by some raters. Both Rater 1 and 2 found Rater 3′s measurements more accurate for the four participants. Rater 3 adjusted her measurements further during the consensus meeting with the other raters. Rater 3 completed the analysis of the remaining participants considering what she had learned during the consensus meeting.
The average GAT in the normophonic group varied from 13.5 to 15.4 ms, and from 11.2 to 22.3 ms for the AdSD group (Figure 1 and Figure 2). The AdSD group showed much greater variability for the GAT compared to the normophonic group: the average coefficient of variability (CV = std/mean × 100) for the AdSD group was 73% compared to 45% for the normophonic group. Greater variability in the AdSD group in comparison with the normophonic group was also observed for the GOTs with an average CV of 70% for the AdSD group vs. 44% for the normophonic group. The AdSD group had higher average GAT values (16.8 ms) than the normophonic group (14.7 ms), but there was no statistically significant difference. The GOT values ranged between 23.7 and 31.5 ms for the normophonic group and between 17.4 and 30.5 ms for the AdSD group. Although the average GOT was slightly higher in the normophonic group (26.4 ms) than in the AdSD group (25.4 ms), no statistically significant difference was observed. The results were confirmed with both parametric tests (ANOVA) and non-parametric tests (Kruskal–Wallis).
We also investigated the relationship between GAT and GOT measures separately for the two groups. Based on the aggregate data, the correlation between the GAT and GOT was very high (Pearson r = 0.635 for the normophonic and r = 0.79 for the AdSD group). The bar graph of the GATs for the normophonic and AdSD group is shown in Figure 1. The data for different subjects are shown with different color bars as indicated in the legend. The median of each participant’s data is shown by the central horizontal lines in the boxes, and the edges of the boxes represent the 25th and 75th percentiles. The whiskers extend to the extreme data points after the outlier removal and the outliers are shown by the individual stars. A similar bar graph for the GOT values for the normophonic and AdSD group is shown in Figure 2. Although more outliers are observed in the AdSD group for the GAT measures (Figure 1), the number of outliers for the GOTs in the normophonic group is higher than in the AdSD group (Figure 2).
The number of onsets/offsets for which the GAT/GOT were measured were compared between the AdSD and normophonic groups. Moreover, the comparison of the number of GAT/GOT with values equal to zero was conducted. None of the differences were statistically significant, but the direction of the effect sizes was as follows: The AdSD group had a lower average number of onsets (42.6 vs. 54) and offsets (50.4 vs. 62.3), a higher number of onsets with zero GATs (4.6 vs. 1.75), and a higher number of offsets with zero GOTs (8.6 vs. 4.8). The statistical tests were repeated after removing the GATs and GOTs with zero values associated with hard glottal attacks/offsets. Accordingly, the AdSD group showed statistically significantly higher average values for GAT than the normophonic group (p = 0.037). The GOT averages for the AdSD group were slightly higher than the normophonic group, but the difference was not statistically significant. The results were confirmed with both parametric tests (ANOVA) and non-parametric tests (Kruskal–Wallis).
The generalized linear model with repeated measures showed that the AdSD group had, on average, higher GAT values than the normophonic group (by the average time of 2.4 ms to 4.0 ms), and the difference was statistically significant. The repeated measures model for the GOT did not achieve a clear picture of the effect of the disorder, as it did not show statistically significant differences between the two groups. No effect of gender or age was observed. The GAT and GOT of the normophonic male participants were higher than those of the normophonic females.
No statistically significant differences were observed for the GATs/GOTs between the true and false vocal folds onsets/offsets for S3M. However, the average GATs/GOTs of the false vocal folds were longer with higher variation than those of the true vocal folds. The average GAT was 19.4 ms (std of 5.2 ms) for the true vocal folds and 25.9 ms (std of 18.9 ms) for the false vocal folds in S3M. The average GOT was 25.4 ms (std of 14.1 ms) for the true vocal folds and 30.8 ms (std of 17.1 ms) for the false vocal folds. The CV value of 73% for the GATs of the false vocal folds and 27% for the true vocal folds was also indicative of the much greater variability of the GAT of the false vocal folds. The distributions of the GAT and GOT for the true and false vocal folds are shown in Figure 3 and Figure 4, respectively. As can be seen, the GAT/GOT values for the true and false vocal folds overlap; however, greater variability and more extreme values can be observed for the false vocal folds’ measures.
The dynamics of the individual GAT and GOT time series were analyzed visually in the order they appeared in each participant (see Figure 5). Figure 5 shows the GAT values for different vocalizations for the normophonic group (panel (a)) and AdSD voices (panel (b)). As can be seen in Figure 5a, the dynamics of GAT is clearly a stationary time series for the normophonic group (constant mean across different groups of vocalizations) with homoskedasticity (constant variance across vocalizations). There are only two onsets for participant N11F with larger delays which are outliers. The GAT dynamics for the AdSD group is a nonstationary time series (Figure 5b). As can be seen in Figure 5b, the mean is constantly changing for different groups of vocalizations, and they are periods of small and large variances (heteroscedasticity). The GAT time-series for AdSD shows persistent big swings from one onset to another for almost all patients except for S6F.
Overall, there was a big difference in the dynamics of the GAT time series, representing consecutive onsets, between patients. The contrast between the normophonic and AdSD groups for GOT was slightly different than that of the GAT. The normophonic participants’ GOT dynamics were still stationary but there was a certain large variability (heteroskedasticity). However, the GOT dynamics for the AdSD patients were extremely nonstationary and extremely volatile (constant rapid changes from one offset to another) to a much larger degree than what was observed for the normophonic group.

4. Discussion

The results show the GATs are longer in the AdSD group than in the normophonic group, which was confirmed by the generalized linear model analysis. The average GAT was 16.8 and 14.7 ms in the AdSD and normophonic groups, respectively. This was supported by previous research that showed a longer latency between the muscular activation and the phonation onset in AdSD compared to the normophonic voices [29]. Moreover, the steady-state onset delay, the time between the first oscillation and a steady-state oscillation, was observed to be longer in AdSD than in the normophonic voices [18]. While the steady state delay measure is different from the GAT measure in this study, the findings are in line with those of the current work. The GAT values in this study were longer than the GATs measured by Orlikoff et al. [25] for comfortable productions of several sustained vocalizations. The difference between our measurements and those of Orlikoff et al. could be due to the shorter vocal attack times for the selected vowel in their study, while our study measures the GAT for different vocalizations in connected speech. Although the average GOT was slightly shorter in the AdSD group (25.4 vs. 26.4 ms), the parametric and non-parametric analysis did not show any significant differences between the AdSD and normophonic groups. The longer GAT and slightly shorter GOT were also observed in a previous study by our group with a smaller sample size and a different data collection HSV system [28].
A greater variability was observed in both the GAT and GOT in the AdSD group, which was expected as the disorder led to more uncertain behaviors and irregularity in the vibrations of the vocal folds. Considering that the AdSD patients have less neurological control over their laryngeal muscles during voice production [30,31,32], the longer GAT and the higher variability in the GAT and GOT of AdSD can be explained. The AdSD patients showed nonstationary vocal fold dynamics with large variations in the GAT and even larger ones for the GOT. The dynamics of vocal fold vibration in the normophonic patients were stationary with small variations in the GAT and larger variations in the GOT. The more volatile behavior of the GOT in the normophonic group than that of the GAT was also observed through the larger number of outliers in GOTs. Therefore, GAT might be a more valuable measure than the GOT since it was found to be statistically different between the normophonic and AdSD groups and a more stable measure within the normophonic group. The high correlations between the GAT and GOT values in both groups indicate that the participants with greater GAT values also tend to have greater GOTs and vice versa. A larger number of onsets and offsets with zero GAT and GOT was observed in the AdSD group which could be related to the disorder. Therefore, the number of the hard glottal attack/offset could be potentially used as another measure to characterize the AdSD. The AdSD group also showed a lower average number of onsets and offsets with measurable GATs/GOTs. This was due to the more excessive laryngeal maneuvers in connected speech in the AdSD group that led to more blockage of the view of the vocal folds and a smaller number of measurable GATs/GOTs. No effect of sex and age was observed using the generalized linear model, which could be due to the small sample size in this study.
The comparison of GAT/GOT measured for the true and false vocal folds in S3M showed some overlap. As the false vocal folds’ vibration is a compensatory strategy, more variability and extreme values were observed for GATs/GOTs of the false vocal folds. The GAT and GOT values for the false vocal folds in S3M were the highest among the AdSD group. Although S3M was a male, and this observation might be explained by the lower fundamental frequency of the voice, a similar observation was not made for the GAT/GOT values of the true vocal folds in this participant. Therefore, the longer GAT/GOT of the false vocal folds could be related to the vibratory characteristics of these folds, which will be further investigated in the future.
The inter-rater reliability assessment showed that the measurements by the three raters were reliable as the upper limit of the 99% confidence interval was less than 18 frames. This frame number (criterion) was considered acceptable in the current study since the fundamental frequency of the voices ranged between 100 and 250 Hz. Some discrepancies between the measurements among the raters were expected due to the nature of the data and the manual measurement. Since the glottal closure can last for several frames, it was possible for the raters to select a slightly different contact frame number. Indicating the first/last oscillation was also challenging since the raters needed to select the frame while the video was being played, and, depending on when the video was stopped, the selected frame could have been different. Additionally, due to the small amplitudes of oscillations, more discrepancies between the oscillation frame numbers by the three raters in comparison with the contact frame numbers were expected. These discrepancies were more noticeable at the offset of phonation due to the longer GOTs than the GATs and more (difficult to detect) miniature oscillations at the offset. The software settings, the computers, and the room brightness were different among the raters, which could have been another source for some of the discrepancies. As the raters became more comfortable with the anatomy/physiology of vocal folds, their measurements became more reliable.
Although the observations of the GAT and GOT in this study are in line with our understanding of the AdSD, the dynamic nature of the GAT and GOT measurements in the HSV data limits the accuracy of manual measurement. The detection of the oscillation/contact frames depended on the quality of the data, the image adjustments by the raters, and the visual acuity of the raters. Additionally, the playback speed of the video and the small amplitudes of the vibrations impacted the accurate measurement of the oscillation frames. These challenges became more noticeable when analyzing the AdSD data due to the higher irregularity, unexpected behavior in the vocal fold vibration, and occurrence of phonatory breaks. In future, an automated method will be developed to overcome the challenges of manual measurements. A more detailed comparison of the GAT/GOT values for equivalent vocalizations needs to be performed to provide more accurate assessments of GAT/GOT differences between the normophonic and AdSD voices, which is among our future goals. Additionally, the relationship between GAT/GOT measures and the severity of AdSD, measured by perceptual voice assessment, will be studied. These works are important for indicating the clinical significance of these measures, as the statistical significance alone is not sufficient to interpret the findings of the study [33]. The small sample size of the current study made it difficult to draw conclusions about the impact of sex or age on the measurements. More data are being collected to address the limitations of this study due to the small sample size.

Author Contributions

Conceptualization, M.N., S.R.C.Z., S.Z. and D.D.D.; methodology, M.N. and S.Z., and F.L.; formal analysis, M.N., S.Z. and F.L.; investigation, M.N., S.R.C.Z. and D.D.D.; resources, M.N., S.R.C.Z. and D.D.D.; writing—original draft preparation, M.N.; writing—review and editing, M.N., S.R.C.Z. and D.D.D.; supervision, M.N.; funding acquisition, M.N. and D.D.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health (NIH) National Institute on Deafness and Other Communication Disorders (NIDCD), K01DC017751 (PI: Naghibolhosseini), R01DC019402 (PI: Deliyski), and R21DC020003 (PI: Naghibolhosseini); as well as by the Michigan State University through the Discretionary Funding Initiative (PI: Deliyski), the Trifecta Initiative Matching Funds Award (PI: Naghibolhosseini), and the Sandi Smith Fellowship, Michigan State University Health and Risk Communication Center.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Boards of Michigan State University and Mayo Clinic, Scottsdale, AZ.

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

Data from this study can be made available upon request to the corresponding author after executing appropriate data-sharing agreements.

Acknowledgments

The authors would like to thank Trent Henry, Corinne Hoedeman, Nicole Heinz, and Mikayla Norton for their help with the data analysis.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Mehta, D.D.; Deliyski, D.D.; Hillman, R.E. Commentary on Why Laryngeal Stroboscopy Really Works: Clarifying Misconceptions Surrounding Talbot’s Law and the Persistence of Vision; American Speech-Language-Hearing Association: Rockville, MD, USA, 2010. [Google Scholar]
  2. DDeliyski, D.; Petrushev, P.P.; Bonilha, H.S.; Gerlach, T.T.; Martin-Harris, B.; Hillman, R.E. Clinical implementation of laryngeal high-speed videoendoscopy: Challenges and evolution. Folia Phoniatr. Et Logop. 2008, 60, 33–44. [Google Scholar] [CrossRef] [PubMed]
  3. Mehta, D.D.; Hillman, R.E. Current role of stroboscopy in laryngeal imaging. Curr. Opin. Otolaryngol. Head Neck Surg. 2012, 20, 429–436. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Kist, A.M.; Durr, S.; Schutzenberger, A.; Dollinger, M. OpenHSV: An open platform for laryngeal high-speed videoendoscopy. Sci. Rep. 2021, 11, 13760. [Google Scholar] [CrossRef]
  5. Deliyski, D.D.; Hillman, R.E. State of the art laryngeal imaging: Research and clinical implications. Curr. Opin. Otolaryngol. Head Neck Surg. 2010, 18, 147–152. [Google Scholar] [CrossRef] [PubMed]
  6. Zacharias, S.R.; Deliyski, D.D.; Gerlach, T.T. Utility of laryngeal high-speed videoendoscopy in clinical voice assessment. J. Voice 2018, 32, 216–220. [Google Scholar] [CrossRef] [PubMed]
  7. Poburka, B.J.; Patel, R.R.; Bless, D.M. Voice-vibratory assessment with laryngeal imaging (VALI) form: Reliability of rating stroboscopy and high-speed videoendoscopy. J. Voice 2017, 31, 513.e1–513.e14. [Google Scholar] [CrossRef]
  8. Kunduk, M.; Dollinger, M.; McWhorter, A.J.; Svec, J.G.; Lohscheller, J. Vocal fold vibratory behavior changes following surgical treatment of polyps investigated with high-speed videoendoscopy and phonovibrography. Ann. Otol. Rhinol. Laryngol. 2012, 121, 355–363. [Google Scholar] [CrossRef] [PubMed]
  9. Yousef, A.M.; Deliyski, D.D.; Zacharias, S.R.; Naghibolhosseini, M. Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy. J. Voice 2022. [Google Scholar] [CrossRef]
  10. Yousef, A.; Deliyski, D.D.; Zacharias, S.R.; Naghibolhosseini, M. Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach. J. Voice 2022. [Google Scholar] [CrossRef] [PubMed]
  11. Tanner, K.; Roy, N.; Merrill, R.M.; Kimber, K.; Sauder, C.; Houtz, D.R.; Doman, D.; Smith, M.E. Risk and protective factors for spasmodic dysphonia: A case-control investigation. J. Voice 2011, 25, e35–e46. [Google Scholar] [CrossRef] [PubMed]
  12. Whurr, R.; Lorch, M. Review of differential diagnosis and management of spasmodic dysphonia. Curr. Opin. Otolaryngol. Head Neck Surg. 2016, 24, 203–207. [Google Scholar] [CrossRef] [Green Version]
  13. Cannito, M.P.; Woodson, G.E.; Murry, T.; Bender, B. Perceptual analyses of spasmodic dysphonia before and after treatment. Arch. Otolaryngol.—Head Neck Surg. 2004, 130, 1393–1399. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Ludlow, C.L.; Adler, C.H.; Berke, G.S.; Bielamowicz, S.A.; Blitzer, A.; Bressman, S.B.; Hallett, M.; Jinnah, H.; Juergens, U.; Martin, S.B.; et al. Research priorities in spasmodic dysphonia. Otolaryngol.—Head Neck Surg. 2008, 139, 495–505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Barkmeier, J.M.; Case, J.L. Differential diagnosis of adductor-type spasmodic dysphonia, vocal tremor, and muscle tension dysphonia. Curr. Opin. Otolaryngol. Head Neck Surg. 2000, 8, 174–179. [Google Scholar] [CrossRef]
  16. Hintze, J.M.; Ludlow, C.L.; Bansberg, S.F.; Adler, C.H.; Lott, D.G. Spasmodic dysphonia: A review. Part 2: Characterization of pathophysiology. Otolaryngol.—Head Neck Surg. 2017, 157, 558–564. [Google Scholar] [CrossRef]
  17. Patel, R.R.; Liu, L.; Galatsanos, N.; Bless, D.M. Differential Vibratory Characteristics of Adductor Spasmodic Dysphonia and Muscle Tension Dysphonia on High-Speed Digital Imaging. Ann. Otol. Rhinol. Laryngol. 2011, 120, 21–32. [Google Scholar] [CrossRef]
  18. Chen, W.; Woo, P.; Murry, T. Vibratory Onset of Adductor Spasmodic Dysphonia and Muscle Tension Dysphonia: A High-Speed Video Study. J. Voice 2020, 34, 598–603. [Google Scholar] [CrossRef] [PubMed]
  19. Naghibolhosseini, M.; Deliyski, D.D.; Zacharias, S.R.; de Alarcon, A.; Orlikoff, R.F. Temporal segmentation for laryngeal high-speed videoendoscopy in connected speech. J. Voice 2018, 32, 256.e1–256.e12. [Google Scholar] [CrossRef]
  20. Yousef, A.M.; Deliyski, D.D.; Zacharias, S.R.C.; de Alarcon, A.; Orlikoff, R.F.; Naghibolhosseini, M. Spatial segmentation for laryngeal high-speed videoendoscopy in connected speech. J. Voice 2023, 37, 26–36. [Google Scholar] [CrossRef]
  21. Yousef, A.M.; Deliyski, D.D.; Zacharias, S.R.; de Alarcon, A.; Orlikoff, R.F.; Naghibolhosseini, M. A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech. Appl. Sci. 2021, 11, 1179. [Google Scholar] [CrossRef]
  22. Naghibolhosseini, M.; Deliyski, D.D.; Zacharias, S.R.; de Alarcon, A.; Orlikoff, R.F. A method for analysis of the vocal fold vibrations in connected speech using laryngeal imaging. In Proceedings of the 10th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications MAVEBA; Manfredi, C., Ed.; Firenze University Press: Firenze, Italy, 2017. [Google Scholar]
  23. Yousef, A.M.; Deliyski, D.D.; Zacharias, S.R.; de Alarcon, A.; Orlikoff, R.F.; Naghibolhosseini, M. A Deep Learning Approach for Quantifying Vocal Fold Dynamics during Connected Speech using Laryngeal High-Speed Videoendoscopy. J. Speech Lang. Hear. Res. 2022, 65, 2098–2113. [Google Scholar] [CrossRef] [PubMed]
  24. Naghibolhosseini, M.; Deliyski, D.D.; Zacharias, S.R.C.; de Alarcon, A.; Orlikoff, R.F. Studying vocal fold non-stationary behavior during connected speech using high-speed videoendoscopy. J. Acoust. Soc. Am. 2018, 144, 1766. [Google Scholar] [CrossRef]
  25. Orlikoff, R.F.; Deliyski, D.D.; Baken, R.; Watson, B.C. Validation of a glottographic measure of vocal attack. J. Voice 2009, 23, 164–168. [Google Scholar] [CrossRef]
  26. Langeveld, T.P.; Drost, H.A.; Zwinderman, A.H.; Frijns, J.H.; De Jong, R.J.B. Perceptual characteristics of adductor spasmodic dysphonia. Ann. Otol. Rhinol. Laryngol. 2000, 109, 741–748. [Google Scholar] [CrossRef] [PubMed]
  27. Ikuma, T.; Kunduk, M.; Fink, D.; McWhorter, A.J. A spatiotemporal approach to the objective analysis of initiation and termination of vocal-fold oscillation with high-speed videoendoscopy. J. Voice 2016, 30, 756-e21. [Google Scholar] [CrossRef]
  28. Brown, C.; Naghibolhosseini, M.; Zacharias, S.R.C.; Deliyski, D.D. Investigation of Glottal Attack and Offset Times in Norm and Neurogenic Voice Disorder. In Proceedings of the Mid-Michigan Symposium for Undergraduate Research Experiences (MID-SURE), East Lansing, MI, USA, 24 July 2019. [Google Scholar]
  29. De Biase, N.G.; Korn, G.P.; Lorenzon, P.; Padovani, M.; Moraes, M.; Madazio, G.; Vilanova, L.C.P. Dysphonia severity degree and phonation onset latency in laryngeal adductor dystonia. J. Voice 2010, 24, 406–409. [Google Scholar] [CrossRef]
  30. Roy, N.; Gouse, M.; Mauszycki, S.C.; Merrill, R.M.; Smith, M.E. Task specificity in adductor spasmodic dysphonia versus muscle tension dysphonia. Laryngoscope 2005, 115, 311–316. [Google Scholar] [CrossRef]
  31. Erickson, M.L. Effects of voicing and syntactic complexity on sign expression in adductor spasmodic dysphonia. Am. J. Speech-Lang. Pathol. 2003, 12, 416–424. [Google Scholar] [CrossRef]
  32. Sapienza, C.M.; Cannito, M.P.; Murry, T.; Branski, R.; Woodson, G. Acoustic variations in reading produced by speakers with spasmodic dysphonia pre-botox injection and within early stages of post-botox injection. J. Speech Lang. Hear. Res. 2002, 45, 830–843. [Google Scholar] [CrossRef]
  33. Sand, A. Inferential Statistics Is an Unfit Tool for Interpreting Data. Appl. Sci. 2022, 12, 7691. [Google Scholar] [CrossRef]
Figure 1. GAT bar graph for normophonic and AdSD voices. The central horizontal line and the bottom and top edges of the boxes indicate the median, 25th, and 75th percentiles, respectively. The whiskers are indicative of extreme data points after the outlier removal. The individual stars show the outliers.
Figure 1. GAT bar graph for normophonic and AdSD voices. The central horizontal line and the bottom and top edges of the boxes indicate the median, 25th, and 75th percentiles, respectively. The whiskers are indicative of extreme data points after the outlier removal. The individual stars show the outliers.
Applsci 13 02979 g001
Figure 2. GOT bar graph for normophonic and AdSD voices. The central horizontal line and the bottom and top edges of the boxes indicate the median, 25th, and 75th percentiles, respectively. The whiskers are indicative of extreme data points after the outlier removal. The individual stars show the outliers.
Figure 2. GOT bar graph for normophonic and AdSD voices. The central horizontal line and the bottom and top edges of the boxes indicate the median, 25th, and 75th percentiles, respectively. The whiskers are indicative of extreme data points after the outlier removal. The individual stars show the outliers.
Applsci 13 02979 g002
Figure 3. GAT distribution (in ms) for the true vocal folds (bottom panel) and false vocal folds (top panel).
Figure 3. GAT distribution (in ms) for the true vocal folds (bottom panel) and false vocal folds (top panel).
Applsci 13 02979 g003
Figure 4. GOT distribution (in ms) for the true vocal folds (bottom panel) and false vocal folds (top panel).
Figure 4. GOT distribution (in ms) for the true vocal folds (bottom panel) and false vocal folds (top panel).
Applsci 13 02979 g004
Figure 5. GAT values (in ms) for different vocalizations (x-axis) for the normophonic voices (panel (a)) and AdSD (panel (b)). The subject IDs are shown in the legend.
Figure 5. GAT values (in ms) for different vocalizations (x-axis) for the normophonic voices (panel (a)) and AdSD (panel (b)). The subject IDs are shown in the legend.
Applsci 13 02979 g005
Table 1. The characteristics of participants in the normophonic and AdSD groups.
Table 1. The characteristics of participants in the normophonic and AdSD groups.
Non-
Pathological
SexAge
N1MM35
N2FF35
N5FF67
N9MM49
N11FF52
AdSD
(No Tremor)
SexAgeLast
Injection (months)
Length of Time Botox Lasts (months)Length of dx
(years)
CAPE-V
Overall
Severity
CAPE-V
Most Severe Characteristic
S3MM763.52.57Severe Strain
* S4FF67522+Moderate-
Severe
Strain
S6FF67323ModerateStrain
S7FF60424N/AN/A
* S8FF76555+ModerateStrain
* Patient was seen at an outside institution prior to their initial visit to Mayo Clinic, so the exact date of diagnosis is unknown.
Table 2. Study inclusion criteria.
Table 2. Study inclusion criteria.
Inclusion CriteriaAdSD GroupNormophonic Group
At least 18 years of age and proficiency in reading English written text.XX
No prior history of intubation injury or airway/laryngeal surgery.XX
Normal hearing.XX
Absence of structural abnormalities including lesions and/or vocal fold paralysis/paresis.XX
Absence of perceptual symptoms of classical dysarthria.XX
Cognitively intact and able to undergo the flexible HSV protocol.XX
Auditory-perceptual characteristics consistent with the disorder (evidence of phonatory breaks on voiced sounds and a strained-strangled quality, and no obvious tremor during phonation).X
Occasional moments of normal-sounding voice.X
Improved voice for non-speech vocalizations.X
Improved voice quality for phonation at higher pitches.X
Not stimulable for voice change with facilitation techniques (e.g., distraction, improving breath and voice coordination and forward resonance, manual manipulation).X
No recent Botox treatment or surgical treatment for AdSD.X
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Naghibolhosseini, M.; Zacharias, S.R.C.; Zenas, S.; Levesque, F.; Deliyski, D.D. Laryngeal Imaging Study of Glottal Attack/Offset Time in Adductor Spasmodic Dysphonia during Connected Speech. Appl. Sci. 2023, 13, 2979. https://doi.org/10.3390/app13052979

AMA Style

Naghibolhosseini M, Zacharias SRC, Zenas S, Levesque F, Deliyski DD. Laryngeal Imaging Study of Glottal Attack/Offset Time in Adductor Spasmodic Dysphonia during Connected Speech. Applied Sciences. 2023; 13(5):2979. https://doi.org/10.3390/app13052979

Chicago/Turabian Style

Naghibolhosseini, Maryam, Stephanie R. C. Zacharias, Sarah Zenas, Farrah Levesque, and Dimitar D. Deliyski. 2023. "Laryngeal Imaging Study of Glottal Attack/Offset Time in Adductor Spasmodic Dysphonia during Connected Speech" Applied Sciences 13, no. 5: 2979. https://doi.org/10.3390/app13052979

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop