Recurrence prediction of lung adenocarcinoma using an immune gene expression and clinical data trained and validated support vector machine classifier

Transl Lung Cancer Res. 2023 Oct 31;12(10):2055-2067. doi: 10.21037/tlcr-23-473. Epub 2023 Oct 27.

Abstract

Background: Immune microenvironment plays a critical role in cancer from onset to relapse. Machine learning (ML) algorithm can facilitate the analysis of lab and clinical data to predict lung cancer recurrence. Prompt detection and intervention are crucial for long-term survival in lung cancer relapse. Our study aimed to evaluate the clinical and genomic prognosticators for lung cancer recurrence by comparing the predictive accuracy of four ML models.

Methods: A total of 41 early-stage lung cancer patients who underwent surgery between June 2007 and October 2014 at New York University Langone Medical Center were included (with recurrence, n=16; without recurrence, n=25). All patients had tumor tissue and buffy coat collected at the time of resection. The CIBERSORT algorithm quantified tumor-infiltrating immune cells (TIICs). Protein-protein interaction (PPI) network and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted to unearth potential molecular drivers of tumor progression. The data was split into training (75%) and validation sets (25%). Ensemble linear kernel support vector machine (SVM) ML models were developed using optimized clinical and genomic features to predict tumor recurrence.

Results: Activated natural killer (NK) cells, M0 macrophages, and M1 macrophages showed a positive correlation with progression. Conversely, T CD4+ memory resting cells were negatively correlated. In the PPI network, TNF and IL6 emerged as prominent hub genes. Prediction models integrating clinicopathological prognostic factors, tumor gene expression (45 genes), and buffy coat gene expression (47 genes) yielded varying receiver operating characteristic (ROC)-area under the curves (AUCs): 62.7%, 65.4%, and 59.7% in the training set, 58.3%, 83.3%, and 75.0% in the validation set, respectively. Notably, merging gene expression with clinical data in a linear SVM model led to a significant accuracy boost, with an AUC of 92.0% in training and 91.7% in validation.

Conclusions: Using ML algorithm, immune gene expression data from tumor tissue and buffy coat may enhance the precision of lung cancer recurrence prediction.

Keywords: Lung adenocarcinoma (LUAD); gene expression; machine learning (ML); recurrence; support vector machine with recursive feature elimination (SVM-RFE).