Screening of lung cancer serum biomarkers based on Boruta-shap and RFC-RFECV algorithms

J Proteomics. 2024 Jun 15:301:105180. doi: 10.1016/j.jprot.2024.105180. Epub 2024 Apr 24.

Abstract

Objective: This study aimed to identify a set of serum miRNAs as potential biomarkers for lung cancer diagnosis using algorithmic approaches.

Methods: Serum miRNA expression data from lung cancer patients and non-tumor controls were obtained. The top six miRNAs were selected using Boruta-shap and RFC-RFECV algorithms. A Gaussian Naive Bayes (NB) classifier was trained and evaluated using cross-validation, ROC curve analysis, and evaluation metrics.

Results: Six miRNAs (hsa-miRNA-144, hsa-miRNA-107, hsa-miRNA-484, hsa-miRNA-103, hsa-miRNA-26b, and hsa-miRNA-641) were identified as feature genes. The NB classifier achieved an area under curve (AUC) of 0.8966 and a mean AUC of 0.88 in cross-validation. Accuracy, recall, and F1 scores exhibited promising results, with an accuracy of 82%. In the validation set, the AUC values for the NB and SVC classifiers were 0.9345 and 0.9423, respectively, with a mean AUC of 0.95 in cross-validation. The classifiers demonstrated an accuracy of 89% in diagnosing lung cancer.

Conclusion: This study identified a panel of six serum miRNAs with potential as non-invasive biomarkers for lung cancer diagnosis. These miRNAs show promise in providing sensitive and specific tools for detecting lung cancer.

Significance: Lung cancer is one of the top cancers worldwide, threatening the health and lives of tens of thousands of people. miRNA is a biomarker, which can be used as a potential clinical tool for diagnosis and prognosis of cancer patients. Therefore, the use of multiple miRNAs to construct diagnostic models may be one of the future methods of accurate diagnosis of lung cancer. In this study, we used the Boruta-shap and RFC-RFECV algorithms to automatically identify and extract characteristic miRNAs highly associated with lung cancer, thereby establishing an accurate classifier for the diagnosis of lung cancer with characteristic miRNAs.

Keywords: Boruta-shap; Diagnosis; Lung cancer; RFC-RFECV; Serum biomarkers.

MeSH terms

  • Aged
  • Algorithms*
  • Bayes Theorem
  • Biomarkers, Tumor* / blood
  • Female
  • Humans
  • Lung Neoplasms* / blood
  • Lung Neoplasms* / diagnosis
  • Male
  • MicroRNAs* / blood
  • Middle Aged

Substances

  • Biomarkers, Tumor
  • MicroRNAs