Forecasting determinants of recurrence in lung cancer patients exploiting various machine learning models

J Biopharm Stat. 2023 May 4;33(3):257-271. doi: 10.1080/10543406.2022.2148162. Epub 2022 Nov 17.

Abstract

Lung cancer recurrence seems to be the most leading cause of death as well as deterioration of lifespan. Proper assessment of the probability of recurrence in early-stage lung cancer is necessary to push up the treatment progress. We therefore employed machine-learning technologies to forecast post-operative recurrence risks using 174 lung cancer patient records. Six classification algorithms logistic regression, SVM, decision tree classification, random forest classification, XGBoost and lightGBM were used to predict the cancer recurrence. The patient samples were divided into training and test group with the split ratio of 3:1 for model generation and the accuracy were validated using k-fold cross-validation method. It is worth noting that the logistic regression model outperformed all the models in both training (Accuracy = 0.82) and test set (Accuracy = 0.79) on k-fold validation. Further, the optimal features (n = 7) identified using the RFE method is certainly helpful to improve the model in a high precision. The imperative risk factors associated with recurrence were identified using three feature selection methods. Importantly, our research showed that age is an important prognostic factor to be considered during the recurrence prediction. Indeed, severe concern on the identified risk factors combined with predictive models assists the physician to reduce the cancer recurrence rate in patients with lung cancer.

Keywords: Lung cancer; cross-validation; feature importance; machine learning; recurrence prediction.

MeSH terms

  • Algorithms
  • Forecasting
  • Humans
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / epidemiology
  • Machine Learning
  • Neoplasm Recurrence, Local* / epidemiology