Diagnostic Accuracies of Laryngeal Diseases Using a Convolutional Neural Network-Based Image Classification System

Won Ki Cho; Yeong Ju Lee; Hye Ah Joo; In Seong Jeong; Yeonjoo Choi; Soon Yuhl Nam; Sang Yoon Kim; Seung-Ho Choi

doi:10.1002/lary.29595

Diagnostic Accuracies of Laryngeal Diseases Using a Convolutional Neural Network-Based Image Classification System

Laryngoscope. 2021 Nov;131(11):2558-2566. doi: 10.1002/lary.29595. Epub 2021 May 17.

Authors

Won Ki Cho¹, Yeong Ju Lee¹, Hye Ah Joo¹, In Seong Jeong¹, Yeonjoo Choi¹, Soon Yuhl Nam¹, Sang Yoon Kim¹, Seung-Ho Choi¹

Affiliation

¹ Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea.

PMID: 34000069
DOI: 10.1002/lary.29595

Abstract

Objectives/hypothesis: There may be an interobserver variation in the diagnosis of laryngeal disease based on laryngoscopic images according to clinical experience. Therefore, this study is aimed to perform computer-assisted diagnosis for common laryngeal diseases using deep learning-based disease classification models.

Study design: Experimental study with retrospective data METHODS: A total of 4106 images (cysts, nodules, polyps, leukoplakia, papillomas, Reinke's edema, granulomas, palsies, and normal cases) were analyzed. After equal distribution of diseases into ninefolds, stratified eightfold cross-validation was performed for training, validation process and remaining onefold was used as a test dataset. A trained model was applied to test sets, and model performance was assessed for precision (positive predictive value), recall (sensitivity), accuracy, F1 score, precision-recall (PR) curve, and PR-area under the receiver operating characteristic curve (PR-AUC). Outcomes were compared to those of visual assessments by four trainees.

Results: The trained deep neural networks (DNNs) outperformed trainees' visual assessments in discriminating cysts, granulomas, nodules, normal cases, palsies, papillomas, and polyps according to the PR-AUC and F1 score. The lowest F1 score and PR-AUC of DNNs were estimated for Reinke's edema (0.720, 0.800) and nodules (0.730, 0.780) but were comparable to the mean of the two trainees' F1 score with the best performances (0.765 and 0.675, respectively). In discriminating papillomas, the F1 score was much higher for DNNs (0.870) than for trainees (0.685). Overall, DNNs outperformed all trainees (micro-average PR-AUC = 0.95; macro-average PR-AUC = 0.91).

Conclusions: DNN technology could be applied to laryngoscopy to supplement clinical assessment of examiners by providing additional diagnostic clues and having a role as a reference of diagnosis.

Level of evidence: 3 Laryngoscope, 131:2558-2566, 2021.

Keywords: Laryngoscopic images; computer diagnosis; computer-aided diagnosis; deep Learning; laryngeal disease; neural networks.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Datasets as Topic
Deep Learning*
Feasibility Studies
Humans
Image Interpretation, Computer-Assisted / methods*
Laryngeal Diseases / diagnosis*
Laryngoscopy / methods*
Larynx / diagnostic imaging*
Predictive Value of Tests
ROC Curve
Retrospective Studies

Grants and funding

20000843/Ministry of Trade, Industry, and Energy (MOTIE, Korea)