Development of a Machine Learning Model Using Limited Features to Predict 6-Month Mortality at Treatment Decision Points for Patients With Advanced Solid Tumors

JCO Clin Cancer Inform. 2022 Mar:6:e2100163. doi: 10.1200/CCI.21.00163.

Abstract

Purpose: Patients with advanced solid tumors may receive intensive treatments near the end of life. This study aimed to create a machine learning (ML) model using limited features to predict 6-month mortality at treatment decision points (TDPs).

Methods: We identified a cohort of adults with advanced solid tumors receiving care at a major cancer center from 2014 to 2020. We identified TDPs for new lines of therapy (LoTs) and confirmed mortality at 6 months after a TDP. Using extreme gradient boosting, ML models were developed, which used or derived features from a limited set of electronic health record data considering the literature, clinical relevance, variability, availability, and predictive importance using Shapley additive explanations scores. We predicted and observed 6-month mortality after a TDP and assessed a risk stratification strategy with different risk thresholds to support communication of chance of survival.

Results: Four thousand one hundred ninety-two patients were included. Patients had 7,056 TDPs, for which the 6-month mortality increased from 17.9% to 46.7% after starting first to sixth LoT, respectively. On the basis of internal validation, models using both 111 (Full) or 45 (Limited-45) features accurately predicted 6-month mortality (area under the curve ≥ 0.80). Using a 0.3 risk threshold in the Limited-45 model, the observed 6-month survival was 34% (95% CI, 28 to 40) versus 81% (95% CI, 81 to 82) among those classified with low or higher chance of survival, respectively. The positive predictive value of the Limited-45 model was 0.66 (95% CI, 0.60 to 0.72).

Conclusion: We developed and validated a ML model using a limited set of 45 features readily derived from electronic health record data to predict 6-month prognosis in patients with advanced solid tumors. The model output may support shared decision making as patients consider the next LoT.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • DNA-Binding Proteins
  • Humans
  • Machine Learning*
  • Neoplasms* / diagnosis
  • Neoplasms* / therapy
  • Predictive Value of Tests
  • Prognosis

Substances

  • DNA-Binding Proteins