Deep Learning-Based Analysis of Glottal Attack and Offset Times in Adductor Laryngeal Dystonia

Ahmed M Yousef; Dimitar D Deliyski; Mohsen Zayernouri; Stephanie R C Zacharias; Maryam Naghibolhosseini

doi:10.1016/j.jvoice.2023.10.011

Deep Learning-Based Analysis of Glottal Attack and Offset Times in Adductor Laryngeal Dystonia

J Voice. 2023 Nov 15:S0892-1997(23)00319-3. doi: 10.1016/j.jvoice.2023.10.011. Online ahead of print.

Authors

Ahmed M Yousef¹, Dimitar D Deliyski¹, Mohsen Zayernouri², Stephanie R C Zacharias³, Maryam Naghibolhosseini⁴

Affiliations

¹ Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
² Departments of Mechanical Engineering & Statistics and Probability, Michigan State University, East Lansing, Michigan.
³ Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona.
⁴ Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan. Electronic address: naghib@msu.edu.

PMID: 37977969
DOI: 10.1016/j.jvoice.2023.10.011

Abstract

Objective: Diagnosis of adductor laryngeal dystonia (AdLD) is challenging as it mimics voice features of other voice disorders. This could lead to misdiagnosis (or delayed diagnosis) and ineffective treatments of AdLD. This paper develops automated measurements of glottal attack time (GAT) and glottal offset time (GOT) from high-speed videoendoscopy (HSV) in connected speech as objective measures that can potentially facilitate the diagnosis of this disorder in the future.

Methods: HSV data were recorded from vocally normal adults and patients with AdLD during the reading of the "Rainbow Passage" and six CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) sentences. A deep learning framework was designed and trained to segment the glottal area and detect the vocal fold edges in the HSV dataset. This automated framework allowed us to automatically measure and quantify the GATs and GOTs for the participants. Accordingly, a comparison was held between the obtained measurements among vocally normal speakers and those with AdLD.

Results: The automated framework was successfully developed and able to accurately segment the glottal area/edges. The precise automated measurements of GAT and GOT revealed minor, nonsignificant differences compared to the results of manual analysis-showing a strong correlation between the measures by the automated and manual methods. The results showed significant differences in the GAT values between the vocally normal subjects and AdLD patients, with larger variability in both the GAT and GOT measures in the AdLD group.

Conclusions: The developed automated approach for GAT and GOT measurement can be valuable in clinical practice. These quantitative measurements can be used as meaningful biomarkers of the impaired vocal function in AdLD and help its differential diagnosis in the future.

Keywords: Adductor laryngeal dystonia; Connected speech; Deep learning; Glottal attack/offset time; High-speed videoendoscopy; Laryngeal imaging.