Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers

Ickhwan Kwon; Soo-Geun Wang; Sung-Chan Shin; Yong-Il Cheon; Byung-Joo Lee; Jin-Choon Lee; Dong-Won Lim; Cheolwoo Jo; Youngseuk Cho; Bum-Joo Shin

doi:10.1016/j.jvoice.2022.07.007

Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers

J Voice. 2022 Sep 5:S0892-1997(22)00209-0. doi: 10.1016/j.jvoice.2022.07.007. Online ahead of print.

Authors

Affiliations

¹ Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea.
² Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea.
³ Department of Otorhinolaryngology-Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea.
⁴ Department of Otorhinolaryngology-Head and Neck Surgery, Pusan National University Hospital, Busan, South Korea.
⁵ School of Electrical, Electronics & Control Engineering, Changwon National University, Changwon, South Korea.
⁶ Department of Statistics, College of Natural Sciences, Pusan National University, Busan, South Korea.
⁷ Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea. Electronic address: voicebjshin@gmail.com.

PMID: 36075802
DOI: 10.1016/j.jvoice.2022.07.007

Abstract

Objectives: The purpose of study is to improve the classification accuracy by comparing the results obtained by applying decision tree ensemble learning, which is one of the methods to increase the classification accuracy for a relatively small dataset, with the results obtained by the convolutional neural network (CNN) algorithm for the diagnosis of glottal cancer.

Methods: Pusan National University Hospital (PNUH) dataset were used to establish classifiers and Pusan National University Yangsan Hospital (PNUYH) dataset were used to verify the classifier's performance in the generated model. For the diagnosis of glottic cancer, deep learning-based CNN models were established and classified using laryngeal image and voice data. Classification accuracy was obtained by performing decision tree ensemble learning using probability through CNN classification algorithm. In this process, the classification and regression tree (CART) method was used. Then, we compared the classification accuracy of decision tree ensemble learning with CNN individual classifiers by fusing the laryngeal image with the voice decision tree classifier.

Results: We obtained classification accuracy of 81.03 % and 99.18 % in the established laryngeal image and voice classification models using PNUH training dataset, respectively. However, the classification accuracy of CNN classifiers decreased to 73.88 % in voice and 68.92 % in laryngeal image when using an external dataset of PNUYH. To solve this problem, decision tree ensemble learning of laryngeal image and voice was used, and the classification accuracy was improved by integrating data of laryngeal image and voice of the same person. The classification accuracy was 87.88 % and 89.06 % for the individualized laryngeal image and voice decision tree model respectively, and the fusion of the laryngeal image and voice decision tree results represented a classification accuracy of 95.31 %.

Conclusion: The results of our study suggest that decision tree ensemble learning aimed at training multiple classifiers is useful to obtain an increased classification accuracy despite a small dataset. Although a large data amount is essential for AI analysis, when an integrated approach is taken by combining various input data high diagnostic classification accuracy can be expected.

Keywords: Diagnosis— Glottic cancer — Laryngeal image and voice — Ensemble learning — Convolutional neural network classifiers.