Machine Learning-Based Radiomics Signatures for EGFR and KRAS Mutations Prediction in Non-Small-Cell Lung Cancer

Int J Mol Sci. 2021 Aug 26;22(17):9254. doi: 10.3390/ijms22179254.

Abstract

Early identification of epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations is crucial for selecting a therapeutic strategy for patients with non-small-cell lung cancer (NSCLC). We proposed a machine learning-based model for feature selection and prediction of EGFR and KRAS mutations in patients with NSCLC by including the least number of the most semantic radiomics features. We included a cohort of 161 patients from 211 patients with NSCLC from The Cancer Imaging Archive (TCIA) and analyzed 161 low-dose computed tomography (LDCT) images for detecting EGFR and KRAS mutations. A total of 851 radiomics features, which were classified into 9 categories, were obtained through manual segmentation and radiomics feature extraction from LDCT. We evaluated our models using a validation set consisting of 18 patients derived from the same TCIA dataset. The results showed that the genetic algorithm plus XGBoost classifier exhibited the most favorable performance, with an accuracy of 0.836 and 0.86 for detecting EGFR and KRAS mutations, respectively. We demonstrated that a noninvasive machine learning-based model including the least number of the most semantic radiomics signatures could robustly predict EGFR and KRAS mutations in patients with NSCLC.

Keywords: EGFR mutation; KRAS mutation; eXtreme Gradient Boosting; feature selection; genetic algorithm; low-dose computed tomography; machine learning; non-small-cell lung carcinoma; radiogenomics.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms
  • Biomarkers
  • Carcinoma, Non-Small-Cell Lung / diagnostic imaging*
  • Carcinoma, Non-Small-Cell Lung / genetics*
  • Carcinoma, Non-Small-Cell Lung / pathology
  • ErbB Receptors / genetics
  • Female
  • Humans
  • Lung Neoplasms / diagnostic imaging*
  • Lung Neoplasms / genetics*
  • Lung Neoplasms / pathology
  • Machine Learning*
  • Male
  • Middle Aged
  • Mutation*
  • Neoplasm Staging
  • Proto-Oncogene Proteins p21(ras) / genetics*
  • ROC Curve
  • Reproducibility of Results
  • Supervised Machine Learning
  • Tomography, X-Ray Computed

Substances

  • Biomarkers
  • KRAS protein, human
  • EGFR protein, human
  • ErbB Receptors
  • Proto-Oncogene Proteins p21(ras)