Automatic classification of pulmonary nodules in computed tomography images using pre-trained networks and bag of features

Multimed Tools Appl. 2023 Apr 20:1-17. doi: 10.1007/s11042-023-14900-5. Online ahead of print.

Abstract

Lung cancer has the highest incidence in the world. The standard tests for its diagnostics are medical imaging exams, sputum cytology, and lung biopsy. Computed Tomography (CT) of the chest plays an essential role in the early detection of nodules since it can allow for more treatment options and increases patient survival. However, the analysis of these exams is a tiring and error-prone process. Thus, computational methods can help the specialist in this analysis. This work addresses the classification of pulmonary nodules as benign or malignant on CT images. Our approach uses the pre-trained VGG16, VGG19, Inception, Resnet50, and Xception, to extract features from each 2D slice of the 3D nodules. Then, we use Principal Component Analysis to reduce the dimensionality of the feature vectors and make them all the same length. Then, we use Bag of Features (BoF) to combine the feature vectors of the different 2D slices and generate only one signature representing the 3D nodule. The classification step uses Random Forest. We evaluated the proposed method with 1,405 segmented nodules from the LIDC-IDRI database and obtained an accuracy of 95.34%, F1-Score of 91.73, kappa of 0.88, sensitivity of 90.53%, specificity of 97.26% and AUC of 0.99. The main conclusion was that the combination by BoF of features extracted from 2D slices using pre-trained architectures produced better results than training 2D and 3D CNNs in the nodules. In addition, the use of BoF also makes the creation of the nodule signature independent of the number of slices.

Keywords: Bag of features; CNN; Lung cancer; Medical images.