Vision-Based Assistance for Vocal Fold Identification in Laryngoscopy with Knowledge Distillation

Thao Thi Phuong Dao; Minh-Khoi Pham; Mai-Khiem Tran; Chanh Cong Ha; Boi Ngoc Van; Bich Anh Tran; Minh-Triet Tran

doi:10.3233/SHTI231104

Vision-Based Assistance for Vocal Fold Identification in Laryngoscopy with Knowledge Distillation

Stud Health Technol Inform. 2024 Jan 25:310:946-950. doi: 10.3233/SHTI231104.

Authors

Thao Thi Phuong Dao^{1

2

3

4}, Minh-Khoi Pham⁵, Mai-Khiem Tran^{1

2

3}, Chanh Cong Ha⁶, Boi Ngoc Van⁷, Bich Anh Tran⁸, Minh-Triet Tran^{1

2

3}

Affiliations

¹ University of Science, VNU-HCMC, Ho Chi Minh City, Vietnam.
² John von Neumann Institute, VNU-HCMC, Ho Chi Minh City, Vietnam.
³ Vietnam National University, Ho Chi Minh City, Vietnam.
⁴ Otorhinolaryngology Department, Thong Nhat Hospital, Ho Chi Minh City, Vietnam.
⁵ Dublin City University, Dublin, Ireland.
⁶ Otorhinolaryngology Department, 7A Military Hospital, Ho Chi Minh City, Vietnam.
⁷ Otorhinolaryngology Department, Vinmec Central Park International Hospital, Ho Chi Minh City, Vietnam.
⁸ Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Vietnam.

PMID: 38269948
DOI: 10.3233/SHTI231104

Abstract

Laryngoscopy images play a vital role in merging computer vision and otorhinolaryngology research. However, limited studies offer laryngeal datasets for comparative evaluation. Hence, this study introduces a novel dataset focusing on vocal fold images. Additionally, we propose a lightweight network utilizing knowledge distillation, with our student model achieving around 98.4% accuracy-comparable to the original EfficientNetB1 while reducing model weights by up to 88%. We also present an AI-assisted smartphone solution, enabling a portable and intelligent laryngoscopy system that aids laryngoscopists in efficiently targeting vocal fold areas for observation and diagnosis. To sum up, our contribution includes a laryngeal image dataset and a compressed version of the efficient model, suitable for handheld laryngoscopy devices.

Keywords: Laryngoscopy; knowledge distillation; vision-based assistance; vocal folds.

MeSH terms

Humans
Intelligence
Knowledge
Laryngoscopy
Larynx*
Vocal Cords*