Identification of Benign and Malignant Lung Nodules in CT Images Based on Ensemble Learning Method

Interdiscip Sci. 2022 Mar;14(1):130-140. doi: 10.1007/s12539-021-00472-1. Epub 2021 Nov 2.

Abstract

Background and objective: Under the background of urgent need for computer-aided technology to provide physicians with objective decision support, aiming at reducing the false positive rate of nodule CT detection in pulmonary nodules detection and improving the accuracy of lung nodule recognition, this paper puts forward a method based on ensemble learning to distinguish between malignant and benign pulmonary nodules.

Methods: Firstly, trained on a public data set, a multi-layer feature fusion YOLOv3 network is used to detect lung nodules. Secondly, a CNN was trained to differentiate benign from malignant pulmonary nodules. Then, based on the idea of ensemble learning, the confidence probability of the above two models and the label of the training set are taken as data features to build a Logistic regression model. Finally, two test sets (public data set and private data set) were tested, and the confidence probability output by the two models was fused into the established logistic regression model to determine benign and malignant pulmonary nodules.

Results: The YOLOv3 network was trained to detect chest CT images of the test set. The number of pulmonary nodules detected in the public and private test sets was 356 and 314, respectively. The accuracy, sensitivity and specificity of the two test sets were 80.97%, 81.63%, 78.75% and 79.69%, 86.59%, 72.16%, respectively. With CNN training pulmonary nodules benign and malignant discriminant model analysis of two kinds of test set, the result of accuracy, sensitivity and specificity were 90.12%, 90.66%, 89.47% and 88.57%, 85.62%, 90.87%, respectively. Fused model based on YOLOv3 network and CNN is tested on two test sets, and the result of accuracy, sensitivity and specificity were 93.82%, 94.85%, 92.59% and 92.31%, 92.68%, 91.89%, respectively.

Conclusion: The ensemble learning model is more effective than YOLOv3 network and CNN in removing false positives, and the accuracy of the ensemble. Learning model is higher than the other two networks in identifying pulmonary nodules.

Keywords: CNN; CT images; Ensemble learning; Logistic regression; Pulmonary nodules; YOLOv3 network.

MeSH terms

  • Diagnosis, Computer-Assisted
  • Diagnosis, Differential
  • Humans
  • Lung Neoplasms* / diagnostic imaging
  • Machine Learning*
  • Radiographic Image Interpretation, Computer-Assisted / methods
  • Solitary Pulmonary Nodule* / diagnostic imaging
  • Tomography, X-Ray Computed / methods