Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers

J Voice. 2022 Sep 5:S0892-1997(22)00209-0. doi: 10.1016/j.jvoice.2022.07.007. Online ahead of print.

Abstract

Objectives: The purpose of study is to improve the classification accuracy by comparing the results obtained by applying decision tree ensemble learning, which is one of the methods to increase the classification accuracy for a relatively small dataset, with the results obtained by the convolutional neural network (CNN) algorithm for the diagnosis of glottal cancer.

Methods: Pusan National University Hospital (PNUH) dataset were used to establish classifiers and Pusan National University Yangsan Hospital (PNUYH) dataset were used to verify the classifier's performance in the generated model. For the diagnosis of glottic cancer, deep learning-based CNN models were established and classified using laryngeal image and voice data. Classification accuracy was obtained by performing decision tree ensemble learning using probability through CNN classification algorithm. In this process, the classification and regression tree (CART) method was used. Then, we compared the classification accuracy of decision tree ensemble learning with CNN individual classifiers by fusing the laryngeal image with the voice decision tree classifier.

Results: We obtained classification accuracy of 81.03 % and 99.18 % in the established laryngeal image and voice classification models using PNUH training dataset, respectively. However, the classification accuracy of CNN classifiers decreased to 73.88 % in voice and 68.92 % in laryngeal image when using an external dataset of PNUYH. To solve this problem, decision tree ensemble learning of laryngeal image and voice was used, and the classification accuracy was improved by integrating data of laryngeal image and voice of the same person. The classification accuracy was 87.88 % and 89.06 % for the individualized laryngeal image and voice decision tree model respectively, and the fusion of the laryngeal image and voice decision tree results represented a classification accuracy of 95.31 %.

Conclusion: The results of our study suggest that decision tree ensemble learning aimed at training multiple classifiers is useful to obtain an increased classification accuracy despite a small dataset. Although a large data amount is essential for AI analysis, when an integrated approach is taken by combining various input data high diagnostic classification accuracy can be expected.

Keywords: Diagnosis— Glottic cancer — Laryngeal image and voice — Ensemble learning — Convolutional neural network classifiers.