Machine learning based estimation of hoarseness severity using sustained vowelsa)

J Acoust Soc Am. 2024 Jan 1;155(1):381-395. doi: 10.1121/10.0024341.

Abstract

Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score ŷ∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acoustics
  • Dysphonia*
  • Hoarseness* / diagnosis
  • Humans
  • Phonation
  • Reproducibility of Results
  • Speech Acoustics
  • Speech Production Measurement
  • Voice Quality