Automatic Segmentation of Membranous Glottal Gap Area with U-Net-Based Architecture

Laryngoscope. 2024 Jan 13. doi: 10.1002/lary.31266. Online ahead of print.

Abstract

Background: While videostroboscopy is recognized as the most popular approach for investigating vocal fold function, evaluating the numerical values, such as the membranous glottal gap area, remains too time consuming for clinical applications.

Methods: We used a total of 2507 videostroboscopy images from 137 patients and developed five U-Net-based deep-learning image segmentation models for automatic masking of the membranous glottal gap area. To further validate the models, we used another 410 images from 41 different patients.

Results: During development, all five models exhibited acceptable and similar metrics. While the VGG19 U-Net had a long inference time of 1654 ms, the other four models had more practical inference times, ranging from 16 to 138 ms. During further validation, Efficient U-Net demonstrated the highest intersection over union of 0.8455, the highest Dice coefficient of 0.9163, and the lowest Hausdorff distance of 1.5626. The normalized membranous glottal gap area index was also calculated and validated. Efficient U-Net and VGG19 U-Net exhibited the lowest mean squared errors (3.5476 and 3.3842) and the lowest mean absolute errors (1.8835 and 1.8396).

Conclusions: Automatic segmentation of the membranous glottal gap area can be achieved through U-net-based architecture. Considering the segmentation quality and speed, Efficient U-Net is a reasonable choice for this task, while the other four models remain valuable competitors. The models' masked area enables possible calculation of the normalized membranous glottal gap area and analysis of the glottal area waveform, revealing promising clinical applications for this model.

Level of evidence: NA Laryngoscope, 2024.

Keywords: U-Net; artificial intelligence; membranous glottal gap; segmentation; videostroboscopy.