Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

Ahmed M Yousef; Dimitar D Deliyski; Stephanie R C Zacharias; Alessandro de Alarcon; Robert F Orlikoff; Maryam Naghibolhosseini

doi:10.1016/j.jvoice.2020.10.017

Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech

J Voice. 2023 Jan;37(1):26-36. doi: 10.1016/j.jvoice.2020.10.017. Epub 2020 Nov 27.

Authors

Ahmed M Yousef¹, Dimitar D Deliyski¹, Stephanie R C Zacharias², Alessandro de Alarcon³, Robert F Orlikoff⁴, Maryam Naghibolhosseini⁵

Affiliations

¹ Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
² Head and Neck Regenerative Medicine Program, Center for Regenerative Medicine, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona.
³ Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Ohio.
⁴ College of Allied Health Sciences, East Carolina University, Greenville, North Carolina.
⁵ Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan. Electronic address: naghib@msu.edu.

Abstract

Objective: This study proposes a new computational framework for automated spatial segmentation of the vocal fold edges in high-speed videoendoscopy (HSV) data during connected speech. This spatio-temporal analytic representation of the vocal folds enables the HSV-based measurement of the glottal area waveform and other vibratory characteristics in the context of running speech.

Methods: HSV data were obtained from a vocally normal adult during production of the "Rainbow Passage." An algorithm based on an active contour modeling approach was developed for the analysis of HSV data. The algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds to detect the edges of the vibrating vocal folds across the frames. This edge detection method follows a set of deformation rules for the active contours to capture the edges of the vocal folds through an energy optimization procedure. The detected edges in the kymograms were then registered back to the HSV frames. Subsequently, the glottal area waveform was calculated based on the area of the glottis enclosed by the vocal fold edges in each frame.

Results: The developed algorithm successfully captured the edges of the vocal folds in the HSV kymograms. This method led to an automated measurement of the glottal area waveform from the HSV frames during vocalizations in connected speech.

Conclusion: The proposed algorithm serves as an automated method for spatial segmentation of the vocal folds in HSV data in connected speech. This study is one of the initial steps toward developing HSV-based measures to study vocal fold vibratory characteristics and voice production mechanisms in norm and disorder in the context of connected speech.

Keywords: Connected Speech; Glottal Area Waveform; High-Speed Videoendoscopy; Laryngeal Imaging; Spatial Segmentation; Voice Assessment.

MeSH terms

Larynx*
Phonation
Speech*
Vibration
Video Recording / methods
Vocal Cords

Abstract

MeSH terms

Grants and funding