Prediction of positive pulmonary nodules based on machine learning algorithm combined with central carbon metabolism data

J Cancer Res Clin Oncol. 2024 Jan 25;150(2):33. doi: 10.1007/s00432-024-05610-y.

Abstract

Background: Lung cancer causes a huge disease burden, and early detection of positive pulmonary nodules (PPNs) as an early sign of lung cancer is extremely important for effective intervention. It is necessary to develop PPNs risk recognizer based on machine learning algorithm combined with central carbon metabolomics.

Methods: The study included 2248 participants at high risk for lung cancer from the Ma'anshan Community Lung Cancer Screening cohort. The Least Absolute Shrinkage and Selection Operator (LASSO) was used to screen 18 central carbon-related metabolites in plasma, recursive feature elimination (RFE) was used to select all 42 features, followed by five machine learning algorithms for model development. The performance of the model was evaluated using area under the receiver operator characteristic curve (AUC), accuracy, precision, recall, and F1 scores. In addition, SHapley Additive exPlanations (SHAP) was performed to assess the interpretability of the final selected model and to gain insight into the impact of features on the predicted results.

Results: Finally, the two prediction models based on the random forest (RF) algorithm performed best, with AUC values of 0.87 and 0.83, respectively, better than other models. We found that homogentisic acid, fumaric acid, maleic acid, hippuric acid, gluconic acid, and succinic acid played a significant role in both PPNs prediction model and NPNs vs PPNs model, while 2-oxadipic acid only played a role in the former model and phosphopyruvate only played a role in the NPNs vs PPNs model. This model demonstrates the potential of central carbon metabolism for PPNs risk prediction and identification.

Conclusion: We developed a series of predictive models for PPNs, which can help in the early detection of PPNs and thus reduce the risk of lung cancer.

Keywords: Central carbon metabolism; Machine learning; Predictive model; Pulmonary nodule; SHapley Additive exPlanations.

MeSH terms

  • Algorithms
  • Carbon
  • Early Detection of Cancer
  • Humans
  • Lung Neoplasms* / diagnosis
  • Machine Learning
  • Multiple Pulmonary Nodules*

Substances

  • Carbon