Temporal Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

Maryam Naghibolhosseini; Dimitar D Deliyski; Stephanie R C Zacharias; Alessandro de Alarcon; Robert F Orlikoff

doi:10.1016/j.jvoice.2017.05.014

Temporal Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

J Voice. 2018 Mar;32(2):256.e1-256.e12. doi: 10.1016/j.jvoice.2017.05.014. Epub 2017 Jun 21.

Authors

Maryam Naghibolhosseini¹, Dimitar D Deliyski², Stephanie R C Zacharias³, Alessandro de Alarcon⁴, Robert F Orlikoff⁵

Affiliations

¹ Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan. Electronic address: naghib@msu.edu.
² Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan; Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
³ Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Division of Speech-Language Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio; Department of Communication Sciences and Disorders, University of Cincinnati, Cincinnati, Ohio.
⁴ Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Cincinnati, Ohio.
⁵ College of Allied Health Sciences, East Carolina University, Greenville, North Carolina.

Abstract

Objective: This study proposes a gradient-based method for temporal segmentation of laryngeal high-speed videoendoscopy (HSV) data obtained during connected speech.

Methods: A custom-developed HSV system coupled with a flexible fiberoptic nasolaryngoscope was used to record one vocally normal female participant during reading of the "Rainbow Passage." A gradient-based algorithm was developed to generate a motion window. When applied to the HSV data, the motion window acted as a filter tracking the location of the vibrating vocal folds. The glottal area waveform was estimated using a statistical-based image-processing approach. The vocal fold vibratory frequency was computed by an autocorrelation-based extraction of the fundamental frequency (f₀) from the glottal area waveform. Temporal segmentation was then performed based on the f₀ contour and automatic detection of the epiglottic obstructions. Additionally, visual temporal segmentation was performed by viewing the HSV images frame by frame to determine the time points of the vocalization onsets and offsets, and the epiglottic obstructions of the glottis.

Results: The time points resulting from the automatic and visual temporal segmentation methods were cross-validated. The f₀-contour patterns of rise and fall resulting from the automatic algorithm were found to be in agreement with the visual inspection of the vibratory frequency change in the HSV data.

Conclusions: This study demonstrated the feasibility of automatic temporal segmentation of HSV imaging of connected speech, which allows for mapping the video content into onsets, offsets, and epiglottic obstructions for each vocalization. Automated analysis of HSV imaging of connected speech has significant clinical potential for advancing instrumental voice assessment protocols.

Keywords: Connected speech; High-speed videoendoscopy; Laryngeal imaging; Voice assessment.

Publication types

Validation Study

MeSH terms

Adult
Algorithms
Automation
Feasibility Studies
Female
Humans
Image Interpretation, Computer-Assisted / methods*
Laryngeal Diseases / diagnosis
Laryngeal Diseases / physiopathology
Laryngoscopy / methods*
Larynx / anatomy & histology*
Larynx / physiology*
Phonation*
Predictive Value of Tests
Reproducibility of Results
Speech Acoustics*
Time Factors
Vibration
Video Recording / methods*
Voice Disorders / diagnosis
Voice Disorders / physiopathology
Voice Quality*

Grants and funding

R01 DC007640/DC/NIDCD NIH HHS/United States