Radiomics feature analysis and model research for predicting histopathological subtypes of non-small cell lung cancer on CT images: A multi-dataset study

Fan Song; Xiao Song; Youdan Feng; Guangda Fan; Yangyang Sun; Peng Zhang; Jinkai Li; Fei Liu; Guanglei Zhang

doi:10.1002/mp.16233

Radiomics feature analysis and model research for predicting histopathological subtypes of non-small cell lung cancer on CT images: A multi-dataset study

Med Phys. 2023 Jul;50(7):4351-4365. doi: 10.1002/mp.16233. Epub 2023 Feb 1.

Authors

Fan Song¹, Xiao Song², Youdan Feng¹, Guangda Fan¹, Yangyang Sun¹, Peng Zhang¹, Jinkai Li³, Fei Liu⁴, Guanglei Zhang¹

Affiliations

¹ Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China.
² School of Medical Imaging, Shanxi Medical University, Taiyuan, China.
³ School of General Engineering, Beihang University, Beijing, China.
⁴ Beijing Advanced Information & Industrial Technology Research Institute, Beijing Information Science & Technology University, Beijing, China.

PMID: 36682051
DOI: 10.1002/mp.16233

Abstract

Purpose: Classifying the subtypes of non-small cell lung cancer (NSCLC) is essential for clinically adopting optimal treatment strategies and improving clinical outcomes, but the histological subtypes are confirmed by invasive biopsy or post-operative examination at present. Based on multi-center data, this study aimed to analyze the importance of extracted CT radiomics features and develop the model with good generalization performance for precisely distinguishing major NSCLC subtypes: adenocarcinoma (ADC) and squamous cell carcinoma (SCC).

Methods: We collected a multi-center CT dataset with 868 patients from eight international databases on the cancer imaging archive (TCIA). Among them, patients from five databases were mixed and split to training and test sets (560:140). The remaining three databases were used as independent test sets: TCGA set (n = 97) and lung3 set (n = 71). A total of 1409 features containing shape, intensity, and texture information were extracted from tumor volume of interest (VOI), then the ℓ_2,1 -norm minimization was used for feature selection and the importance of selected features was analyzed. Next, the prediction and generalization performance of 130 radiomics models (10 common algorithms and 120 heterogeneous ensemble combinations) were compared by the average AUC value on three test sets. Finally, predictive results of the optimal model were shown.

Results: After feature selection, 401 features were obtained. Features of intensity, texture GLCM, GLRLM, and GLSZM had higher classification weight coefficients than other features (shape, texture GLDM, and NGTDM), and the filtered image features exhibited significant importance than original image features (p-value = 0.0210). Moreover, five ensemble learning algorithms (Bagging, AdaBoost, RF, XGBoost, GBDT) had better generalization performance (p-value = 0.00418) than other non-ensemble algorithms (MLP, LR, GNB, SVM, KNN). The Bagging-AdaBoost-SVM model had the highest AUC value (0.815 ± 0.010) on three test sets. It obtained AUC values of 0.819, 0.823, and 0.804 on test set, TCGA set and lung3 set, respectively.

Conclusion: Our multi-dataset study showed that intensity features, texture features (GLCM, GLRLM, and GLSZM) and filtered image features were more important for distinguishing ADCs from SCCs. The method of ensemble learning can improve the prediction and generalization performance on the complicated multi-center data. The Bagging-AdaBoost-SVM model had the strongest generalization performance, and it showed promising clinical value for non-invasively predicting the histopathological subtypes of NSCLC.

Keywords: CT; NSCLC subtype classification; ensemble learning; generalization performance; multi-center dataset; radiomics method.

MeSH terms

Adenocarcinoma* / pathology
Algorithms
Carcinoma, Non-Small-Cell Lung* / diagnostic imaging
Carcinoma, Non-Small-Cell Lung* / pathology
Humans
Lung Neoplasms* / diagnostic imaging
Lung Neoplasms* / pathology
Retrospective Studies
Tomography, X-Ray Computed / methods

Abstract

MeSH terms

Grants and funding