Machine Learning Assessment of Spasmodic Dysphonia Based on Acoustical and Perceptual Parameters
2. Material and Methods
2.1. Perceptual Parameters
- G (global grade of dysphonia): the judgement is based on the overall impression of voice quality deterioration;
- R (Roughness): the impression of irregular F0 and of noise;
- B (Breathiness): turbulent noise related to air escape through the vocal folds.
- I (Intelligibility): the impression of the possibility of the patient to be understood by the listener.
- F (Fluency): intended as smoothness of speech production.
- Vo (Voicing): the capability to correctly utter voiced or unvoiced speech, that is, the speech is voiced or unvoiced when it actually needs to be voiced or unvoiced .
- S (Spasmodicity): this parameter is related to the perception of voice breaks, tremor, and strain.
2.2. Acoustical Analysis
2.3. Machine Learning and Statistical Analysis
- For the KNN classifier: the number of neighbors k was evaluated between 2 and 27. The considered distance metrics were “cityblock”, “Chebyshev”, “correlation”, “cosine”, “Euclidean”, “hamming”, “jaccard”, “mahalanobis”, “minkowski”, “seuclidean”, “spearman” (according to ). The distance weight was chosen between “equal”, “inverse”, “squared inverse”.
- For the SVM classifier: coding was selected between “one vs. one” or “one vs. all”. Box constraint and kernel scale were evaluated between 10−3 and 103. The kernel function was set as Gaussian.
- For random forest, the fitcensemble.m function was used, and the aggregation method was set as “Bag”. The minimum number of leaves was selected between 2 and 27. The maximum number of splits was between 2 and 27. The split criteria were between “deviance”, “gdi”, and “twoing” (according to ). The number of variables to sample was between 1 and 55.
- It samples new data from the instance and calculates distances between sampled data and the original observation.
- It uses the complex model that needs to be explained to make predictions on synthesized data and then it trains the simple model.
- The simple model is weighted using the same distance metric of step 1 and therefore it identifies the features that contributed in order to obtain a specific prediction.
- The percentage of voiced parts in the whole audio signal (% voiced) was the parameter most strongly related to G value. This may be linked to repeated interruptions that significantly lower voice quality. This result is supported also by the two parameters: duration mean and duration max, which highlight the longer time required to emit the word /a’jwɔle/ due to alterations in vocal fold mobility. F0 mean and F1 mean contributed as well. The relevance of F1 mean could be associated with pharynx constriction degree .
- R assessment is linked to F2 median and % voiced. These results suggest that roughness is related to tongue movements . Jitter was found to be relevant in 8 out 28 predictions with LIME, but its total contribution weight was equal to 3.1%. Interestingly, jitter represented a “confounding” parameter causing moderate cases of AdSD to be classified as severe. In 25% of observations, NNE, a measure of noise alternative to HNR, was considered relevant, but its total contribution weight was equal to 2.4% only. However, it is important to specify that these discrepancies with literature results [39,41,42] are probably caused by different tasks: while jitter and NNE typically show strong correlations with R when healthy subjects are compared with patients diagnosed with AdSD, probably jitter and NNE do not represent relevant parameters to assess and distinguish among different AdSD severity classes.
- B ratings and F1 median values showed the strongest relationship. As shown in Figure 4, even if PSD III (range [1–1.5] kHz) is considered a relevant parameter, it caused two out of four misclassifications from moderate to mild, and therefore it should be discarded.
- Spasmodicity is associated with F1 median as well as the medium–high frequency region of the spectrum, described by PSD VIII (range of 3.5–4 kHz).
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Jinnah, H.A.; Berardelli, A.; Comella, C.; Defazio, G.; Delong, M.; Factor, S.; Galpern, W.; Hallett, M.; Ludlow, C.; Perlmutter, J.; et al. The focal dystonias: Current views and challenges for future research. Mov. Disord. 2013, 28, 926–943. [Google Scholar] [CrossRef] [PubMed]
- Hintze, J.M.; Ludlow, C.; Bansberg, S.; Adler, C.; Lott, D.G. Spasmodic Dysphonia: A Review. Part 1: Pathogenic Factors. Otolaryngol. Head Neck Surg. 2017, 157, 551–557. [Google Scholar] [CrossRef] [PubMed]
- Hyodo, M.; Asano, K.; Nagao, A.; Hirose, K.; Nakahira, M.; Yanagida, S.; Nishizawa, N. Botulinum Toxin Therapy: A Series of Clinical Studies on Patients with Spasmodic Dysphonia in Japan. Toxins 2021, 13, 840. [Google Scholar] [CrossRef] [PubMed]
- Prudente, C.N.; Chen, M.; Stipancic, K.; Marks, K.; Samargia-Grivette, S.; Goding, G.; Green, J.; Kimberley, T.J. Effects of low-frequency repetitive transcranial magnetic stimulation in adductor laryngeal dystonia: A safety, feasibility, and pilot study. Exp. Brain Res. 2022, 240, 561–574. [Google Scholar] [CrossRef] [PubMed]
- Dejonckere, P.H.; Neumann, K.J.; Moerman, M.B.J.; Martens, J.P.; Giordano, A.; Manfredi, C. Tridimensional assessment of adductor spasmodic dysphonia pre- and post-treatment with Botulinum toxin. Eur. Arch. Oto-Rhino-Laryngol. 2011, 269, 1195–1203. [Google Scholar] [CrossRef][Green Version]
- Cantarella, G.; Berlusconi, A.; Maraschi, B.; Ghio, A.; Barbieri, S. Botulinum toxin injection and airflow stability in spasmodic dysphonia. Otolaryngol. Head Neck Surg. 2006, 134, 419–423. [Google Scholar] [CrossRef][Green Version]
- Suppa, A.; Asci, F.; Saggio, G.; Marsili, L.; Casali, D.; Zarezadeh, Z.; Ruoppolo, G.; Berardelli, A.; Costantini, G. Voice analysis in adductor spasmodic dysphonia: Objective diagnosis and response to botulinum toxin. Park. Relat. Disord. 2020, 73, 23–30. [Google Scholar] [CrossRef]
- Roy, N.; Ma, A.M.; Awan, S.N. Automated acoustic analysis of task dependency in adductor spasmodic dysphonia versus muscle tension dysphonia. Laryngoscope 2013, 124, 718–724. [Google Scholar] [CrossRef]
- Hintze, J.M.; Ludlow, C.L.; Bansberg, S.F.; Adler, C.H.; Lott, D.G. Spasmodic Dysphonia: A Review. Part 2: Characterization of Pathophysiology. Otolaryngol. Neck Surg. 2017, 157, 558–564. [Google Scholar] [CrossRef]
- Schlotthauer, G.; Torres, M.E.; Jackson-Menaldi, M.C. A Pattern Recognition Approach to Spasmodic Dysphonia and Muscle Tension Dysphonia Automatic Classification. J. Voice 2010, 24, 346–353. [Google Scholar] [CrossRef]
- Costantini, G.; Di Leo, P.; Asci, F.; Zarezadeh, Z.; Marsili, L.; Errico, V.; Suppa, A.; Saggio, G. Machine Learning based Voice Analysis in Spasmodic Dysphonia: An Investigation of Most Relevant Features from Specific Vocal Tasks. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021), Austria, Vienna, 11–13 February 2021; Volume 4, pp. 103–113. [Google Scholar]
- Powell, M.E.; Cancio, M.R.; Young, D.; Nock, W.; Abdelmessih, B.; Zeller, A.; Morales, I.P.; Zhang, P.; Garrett, C.G.; Schmidt, D.; et al. Decoding phonation with artificial intelligence (D e P AI): Proof of concept. Laryngoscope Investig. Otolaryngol. 2019, 4, 328–334. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Hu, H.-C.; Chang, S.-Y.; Wang, C.-H.; Li, K.-J.; Cho, H.-Y.; Chen, Y.-T.; Lu, C.-J.; Tsai, T.-P.; Lee, O.K.-S. Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study. J. Med. Internet Res. 2021, 23, e25247. [Google Scholar] [CrossRef] [PubMed]
- Fang, S.-H.; Tsao, Y.; Hsiao, M.-J.; Chen, J.-Y.; Lai, Y.-H.; Lin, F.-C.; Wang, C.-T. Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J. Voice 2018, 33, 634–641. [Google Scholar] [CrossRef]
- Berardelli, A.; Abbruzzese, G.; Bertolasi, L.; Cantarella, G.; Carella, F.; Currà, A.; De Grandis, D.; DeFazio, G.; Galardi, G.; Girlanda, P.; et al. Guidelines for the therapeutic use of bot-ulinum toxin in movement disorders. Ital. J. Neurol. Sci. 1997, 18, 261–269. [Google Scholar] [CrossRef] [PubMed]
- Hirano, M. Clinical Examination of Voice, in Disorders of Human Communication; Springer: Wien, Germany, 1981; pp. 1–99. [Google Scholar]
- Bhuta, T.; Patrick, L.; Garnett, J.D. Perceptual evaluation of voice quality and its correlation with acoustic measurements. J. Voice 2004, 18, 299–304. [Google Scholar] [CrossRef] [PubMed]
- Ricci-Maccarini, A.; Limarzi, M.; Pieri, F.; Stacchini, M.; Lucchini, E.; Magnami, M. Refertazione e interpretazione dei tracciati e dei questionari nello studio della disfonia. In Refertazione e Interpretazione dei Tracciati e dei Questionari in ORL; TorGraf: Lecce, Italy, 2002; pp. 285–320. [Google Scholar]
- Dejonckere, P.H.; Bradley, P.; Celemente, P.; Cornut, G.; Crevier-Buchman, L.; Friedrich, G.; Van de Heyning, P.; Remacle, M.; Woisard, V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phono-surgical) treatments and evaluating new assessment techniques. Eur. Arch. Otorhinolaryngol. 2001, 258, 77–82. [Google Scholar] [CrossRef] [PubMed]
- Moerman, M.B.J.; Martens, J.; Van der Borgt, M.; Peleman, M.; Gillis, M.; Dejockere, P.H. Perceptual evaluation of sub-stitution voices: Development and evaluation of the (I)INFVo rating scale. Eur. Arch. Otorhinolaryngol. 2006, 263, 183–187. [Google Scholar] [CrossRef]
- Siemons-Lühring, D.I.; Moerman, M.; Martens, J.-P.; Deuster, D.; Müller, F.; Dejonckere, P. Spasmodic dysphonia, perceptual and acoustic analysis: Presenting new diagnostic tools. Eur. Arch. Oto-Rhino-Laryngol. 2009, 266, 1915–1922. [Google Scholar] [CrossRef][Green Version]
- Morelli, M.S.; Orlandi, S.; Manfredi, C. BioVoice: A multipurpose tool for voice analysis. Biomed. Signal Process. Control. 2020, 64, 102302. [Google Scholar] [CrossRef]
- Manfredi, C.; Bocchi, L.; Cantarella, G. A multipurpose user-friendly tool for voice analysis: Application to pathological adult voices. Biomed. Signal Process. Control 2009, 4, 212–220. [Google Scholar] [CrossRef]
- Manfredi, C.; Bandini, A.; Melino, D.; Viellevoye, R.; Kalenga, M.; Orlandi, S. Automated detection and classification of basic shapes of newborn cry melody. Biomed. Signal Process. Control 2018, 45, 174–181. [Google Scholar] [CrossRef]
- Bandini, A.; Giovannelli, F.; Orlandi, S.; Barbagallo, S.; Cincotta, M.; Vanni, P.; Manfredi, C. Automatic identification of dys-prosody in idiopathic Parkinson’s disease. Biomed. Signal Process. Control 2015, 17, 47–54. [Google Scholar] [CrossRef]
- Frassineti, L.; Calà, F.; Sforza, E.; Onesimo, R.; Leoni, C.; Lanatà, A.; Zampino, G.; Manfredi, C. Quantitative acoustical analysis of genetic syndromes in the number listing task. Biomed. Signal Process. Control, 2023; accepted. [Google Scholar]
- Manfredi, C.; Altamore, V.; Bandini, A.; Orlandi, S.; Battilocchi, L.; Cantarella, G. Effect of Protective Masks on Voice Parameters: Acoustical Analysis of Sustained Vowels. Proc. Model. Anal. Vocal Emiss. Biomed. Appl. 2021, 8, 171–174. [Google Scholar]
- Teixeira, J.P.; Oliveira, C.; Lopes, C. Vocal acoustic analysis-jitter, shimmer and hnr parameters. Procedia Technol. 2013, 9, 1112–1122. [Google Scholar] [CrossRef][Green Version]
- Kasuya, H.; Ogawa, S.; Mashima, K.; Ebihara, S. Normalized noise energy as an acoustic measure to evaluate pathologic voice. J. Acoust. Soc. Am. 1986, 80, 1329–1334. [Google Scholar] [CrossRef] [PubMed]
- Rajula, H.S.R.; Verlato, G.; Manchia, M.; Antonucci, N.; Fanos, V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Devolopment, and Treatment. Medicina 2020, 56, 455. [Google Scholar] [CrossRef]
- Healy, B.C. Machine and deep learning in MS research are just powerful statistics—No. Mult. Scler. J. 2021, 27, 663–664. [Google Scholar] [CrossRef]
- Ij, H. Statistics versus machine learning. Nat. Methods 2018, 15, 233–234. [Google Scholar]
- Bur, A.M.; Shew, M.; New, J. Artificial Intelligence for the Otolaryngologist: A State of the Art Review. Otolaryngol. Neck Surg. 2019, 160, 603–611. [Google Scholar] [CrossRef]
- MATLAB and Statistics Toolbox Release 2020b; The MathWorks, Inc.: Natick, MA, USA, 2020.
- Harar, P.; Galaz, Z.; Alonso-Hernandez, J.B.; Mekyska, J.; Burget, R.; Smekal, Z. Towards robust voice pathology detection. Neural Comput. Appl. 2018, 32, 15747–15757. [Google Scholar] [CrossRef][Green Version]
- MATLAB. Fitcknn. Available online: https://www.mathworks.com/help/stats/fitcknn.html (accessed on 20 November 2022).
- MATLAB. Fitcensemble. Available online: https://www.mathworks.com/help/stats/fitcensemble.html (accessed on 20 November 2022).
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?” Explaining the Prediction of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Dejonckere, P.H.; Lebacq, J. Acoustic, perceptual, aerodynamic and anatomical correlations in voice pathology. ORL J Otorhinolaryngol. Relat. Spec. 1996, 58, 326–332. [Google Scholar] [CrossRef] [PubMed]
- Dejonckere, P.H.; Remacle, M.; Fresnel-Elbaz, E.; Woisard, V.; Crevier-Buchman, L.; Millet, B. Differentiated perceptual evaluation of pathological voice quality: Reliability and correlations with acoustic measurements. Rev. Laryngol.-Otol.-Rhinol. 1996, 117, 219–224. [Google Scholar]
- Park, J.W.; Kim, B.; Oh, J.H.; Kang, T.K.; Kim, D.Y.; Woo, J.H. Study for Correlation between Objective and Subjective Voice Parameters in Patients with Dysphonia. J. Korean Soc. Laryngol. Phoniatr. Logop. 2019, 30, 118–123. [Google Scholar] [CrossRef][Green Version]
- Narasimhan, S.; Rashmi, R. Multiparameter Voice Assessment in Dysphonics: Correlation Between Objective and Perceptual Parameters. J. Voice 2020, 36, 335–343. [Google Scholar] [CrossRef]
- Dejonckere, P.H.; Neumann, K.; Moerman, M.; Martens, J.P. Perceptual and acoustic assessment of adductor spasmodic dysphonia pre- and posttreatment with botulinum toxin. In Proceedings of the 3rd Advanced Voice Function Asssessment Inter-national Workshop, Madrid, Spain, 18–20 May 2009; pp. 169–172. [Google Scholar]
- Deller, J.R.; Hansen, J.H.L.; Proakis, J.G. Discrete-Time Processing of Speech Signals; Prentice Hall PTR: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
|Values of Perceptual Indices||Class||Class Values|
|Class||Number of Patients|
|LOSO Cross—Validation Accuracy||Model||Hyperparameters|
|G||82%||KNN||K = 2; d = spearman; w = inverse|
|R||86%||KNN||K = 2; d = spearman; w = equal|
|B||79%||KNN||K = 7; d = cityblock; w = squaredinverse|
|Intelligibility||54%||KNN||K = 17; d = seuclidean; w = squaredinverse|
|Fluency||68%||RF||C = twoing; v = 48; l = 2; s = 1; N = 20|
|Voicing||61%||KNN||K = 11; d = spearman; w = equal|
|Spasmodicity||71%||KNN||K = 3; d = seucliean; w = inverse|
|G||% voiced, F0 mean|
|R||F2 median, % voiced|
|B||F1 median, PSD III|
|Intelligibility||F0 max, T0(F0 max)|
|Voicing||F0 mean, PSD I|
|Spasmodicity||F1 median, PSD VIII|
|G||Spasmodicity, R, Voicing|
|Parameter||Class 1 (Severe)||Class 2 (Moderate)||Class 3 (Mild)|
|Precision||0.92 (0.75–1.00)||0.86 (0.67–1.00)||1.00 (0.00–1.00)|
|Sensitivity||0.92 (0.73–1.00)||0.92 (0.75–1.00)||0.50 (0.00–1.00)|
|Specificity||0.93 (0.77–1.00)||0.87 (0.67–1.00)||1.00 (0.00–1.00)|
|F-score||0.92 (0.78–1.00)||0.89 (0.74–1.00)||0.67 (0.00–1.00)|
|AUC||0.91 (0.77–1.00)||0.87 (0.71–1.00)||0.69 (0.00–1.00)|
|F0 mean (0.012)|
F0 median (0.043)
F0 max (0.003)
|F0 max (0.014)|
F1 mean (0.037)
F2 mean (0.038)
F2 median (0.006)
% voiced (0.045)
|F0 mean (0.041)|
F0 max (0.008)
F1 std (0.045)
Pause duration min (0.049)
PSD I (0.010)
|F0 mean (0.005)|
F0 max (0.001)
F2 min (0.030)
|F1 mean (0.018)|
F1 median (0.050)
F1 min (0.004)
F2 mean (0.024)
F2 median (0.032)
F2 max (0.043)
% voiced (0.024)
Number of units (0.050)
Number of pauses (0.050)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Calà, F.; Frassineti, L.; Manfredi, C.; Dejonckere, P.; Messina, F.; Barbieri, S.; Pignataro, L.; Cantarella, G. Machine Learning Assessment of Spasmodic Dysphonia Based on Acoustical and Perceptual Parameters. Bioengineering 2023, 10, 426. https://doi.org/10.3390/bioengineering10040426
Calà F, Frassineti L, Manfredi C, Dejonckere P, Messina F, Barbieri S, Pignataro L, Cantarella G. Machine Learning Assessment of Spasmodic Dysphonia Based on Acoustical and Perceptual Parameters. Bioengineering. 2023; 10(4):426. https://doi.org/10.3390/bioengineering10040426Chicago/Turabian Style
Calà, Federico, Lorenzo Frassineti, Claudia Manfredi, Philippe Dejonckere, Federica Messina, Sergio Barbieri, Lorenzo Pignataro, and Giovanna Cantarella. 2023. "Machine Learning Assessment of Spasmodic Dysphonia Based on Acoustical and Perceptual Parameters" Bioengineering 10, no. 4: 426. https://doi.org/10.3390/bioengineering10040426