Temporal Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

J Voice. 2018 Mar;32(2):256.e1-256.e12. doi: 10.1016/j.jvoice.2017.05.014. Epub 2017 Jun 21.

Abstract

Objective: This study proposes a gradient-based method for temporal segmentation of laryngeal high-speed videoendoscopy (HSV) data obtained during connected speech.

Methods: A custom-developed HSV system coupled with a flexible fiberoptic nasolaryngoscope was used to record one vocally normal female participant during reading of the "Rainbow Passage." A gradient-based algorithm was developed to generate a motion window. When applied to the HSV data, the motion window acted as a filter tracking the location of the vibrating vocal folds. The glottal area waveform was estimated using a statistical-based image-processing approach. The vocal fold vibratory frequency was computed by an autocorrelation-based extraction of the fundamental frequency (f0) from the glottal area waveform. Temporal segmentation was then performed based on the f0 contour and automatic detection of the epiglottic obstructions. Additionally, visual temporal segmentation was performed by viewing the HSV images frame by frame to determine the time points of the vocalization onsets and offsets, and the epiglottic obstructions of the glottis.

Results: The time points resulting from the automatic and visual temporal segmentation methods were cross-validated. The f0-contour patterns of rise and fall resulting from the automatic algorithm were found to be in agreement with the visual inspection of the vibratory frequency change in the HSV data.

Conclusions: This study demonstrated the feasibility of automatic temporal segmentation of HSV imaging of connected speech, which allows for mapping the video content into onsets, offsets, and epiglottic obstructions for each vocalization. Automated analysis of HSV imaging of connected speech has significant clinical potential for advancing instrumental voice assessment protocols.

Keywords: Connected speech; High-speed videoendoscopy; Laryngeal imaging; Voice assessment.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Algorithms
  • Automation
  • Feasibility Studies
  • Female
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Laryngeal Diseases / diagnosis
  • Laryngeal Diseases / physiopathology
  • Laryngoscopy / methods*
  • Larynx / anatomy & histology*
  • Larynx / physiology*
  • Phonation*
  • Predictive Value of Tests
  • Reproducibility of Results
  • Speech Acoustics*
  • Time Factors
  • Vibration
  • Video Recording / methods*
  • Voice Disorders / diagnosis
  • Voice Disorders / physiopathology
  • Voice Quality*