Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

J Voice. 2023 Jan;37(1):26-36. doi: 10.1016/j.jvoice.2020.10.017. Epub 2020 Nov 27.

Abstract

Objective: This study proposes a new computational framework for automated spatial segmentation of the vocal fold edges in high-speed videoendoscopy (HSV) data during connected speech. This spatio-temporal analytic representation of the vocal folds enables the HSV-based measurement of the glottal area waveform and other vibratory characteristics in the context of running speech.

Methods: HSV data were obtained from a vocally normal adult during production of the "Rainbow Passage." An algorithm based on an active contour modeling approach was developed for the analysis of HSV data. The algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds to detect the edges of the vibrating vocal folds across the frames. This edge detection method follows a set of deformation rules for the active contours to capture the edges of the vocal folds through an energy optimization procedure. The detected edges in the kymograms were then registered back to the HSV frames. Subsequently, the glottal area waveform was calculated based on the area of the glottis enclosed by the vocal fold edges in each frame.

Results: The developed algorithm successfully captured the edges of the vocal folds in the HSV kymograms. This method led to an automated measurement of the glottal area waveform from the HSV frames during vocalizations in connected speech.

Conclusion: The proposed algorithm serves as an automated method for spatial segmentation of the vocal folds in HSV data in connected speech. This study is one of the initial steps toward developing HSV-based measures to study vocal fold vibratory characteristics and voice production mechanisms in norm and disorder in the context of connected speech.

Keywords: Connected Speech; Glottal Area Waveform; High-Speed Videoendoscopy; Laryngeal Imaging; Spatial Segmentation; Voice Assessment.

MeSH terms

  • Larynx*
  • Phonation
  • Speech*
  • Vibration
  • Video Recording / methods
  • Vocal Cords