Prognostic value of plasma microRNAs for non-small cell lung cancer based on data mining models

BMC Cancer. 2024 Jan 10;24(1):52. doi: 10.1186/s12885-024-11830-9.

Abstract

Background: As biomarkers, microRNAs (miRNAs) are closely associated with the occurrence, progression, and prognosis of non-small cell lung cancer (NSCLC). However, the prognostic predictive value of miRNAs in NSCLC has rarely been explored. In this study, the value in prognosis prediction of NSCLC was mined based on data mining models using clinical data and plasma miRNAs biomarkers.

Methods: A total of 69 patients were included in this prospective cohort study. After informed consent, they filled out questionnaires and had their peripheral blood collected. The expressions of plasma miRNAs were examined by quantitative polymerase chain reaction (qPCR). The Whitney U test was used to analyze non-normally distributed data. Kaplan-Meier was used to plot the survival curve, the log-rank test was used to compare with the overall survival curve, and the Cox proportional hazards model was used to screen the factors related to the prognosis of lung cancer. Data mining techniques were utilized to predict the prognostic status of patients.

Results: We identified that smoking (HR = 2.406, 95% CI = 1.256-4.611), clinical stage III + IV (HR = 5.389, 95% CI = 2.290-12.684), the high expression group of miR-20a (HR = 4.420, 95% CI = 1.760-11.100), the high expression group of miR-197 (HR = 3.828, 95% CI = 1.778-8.245), the low expression group of miR-145 ( HR = 0.286, 95% CI = 0.116-0.709), and the low expression group of miR-30a (HR = 0.307, 95% CI = 0.133-0.706) was associated with worse prognosis. Among the five data mining models, the decision trees (DT) C5.0 model performs the best, with accuracy and Area Under Curve (AUC) of 93.75% and 0.929 (0.685, 0.997), respectively.

Conclusion: The results showed that the high expression level of miR-20a and miR-197, the low expression level of miR-145 and miR-30a were strongly associated with poorer prognosis in NSCLC patients, and the DT C5.0 model may serve as a novel, accurate, method for predicting prognosis of NSCLC.

Keywords: Data mining; MicroRNA; Non-small cell lung cancer; Prediction; Prognosis.

MeSH terms

  • Biomarkers
  • Carcinoma, Non-Small-Cell Lung* / genetics
  • Data Mining
  • Humans
  • Lung Neoplasms* / genetics
  • MicroRNAs* / genetics
  • Prognosis
  • Prospective Studies

Substances

  • MicroRNAs
  • Biomarkers
  • MIRN145 microRNA, human