Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer

JCO Clin Cancer Inform. 2022 Oct:6:e2200064. doi: 10.1200/CCI.22.00064.

Abstract

Purpose: Predicting short-term mortality in patients with advanced cancer remains challenging. Whether digitalized clinical text can be used to build models to enhance survival prediction in this population is unclear.

Materials and methods: We conducted a single-centered retrospective cohort study in patients with advanced solid tumors. Clinical correspondence authored by oncologists at the first patient encounter was extracted from the electronic medical records. Machine learning (ML) models were trained using narratives from the derivation cohort, before being tested on a temporal validation cohort at the same site. Performance was benchmarked against Eastern Cooperative Oncology Group performance status (PS), comparing ML models alone (comparison 1) or in combination with PS (comparison 2), assessed by areas under receiver operating characteristic curves (AUCs) for predicting vital status at 11 time points from 2 to 52 weeks.

Results: ML models were built on the derivation cohort (4,791 patients from 2001 to April 2017) and tested on the validation cohort of 726 patients (May 2017-June 2019). In 441 patients (61%) where clinical narratives were available and PS was documented, ML models outperformed the predictivity of PS (mean AUC improvement, 0.039, P < .001, comparison 1). Inclusion of both clinical text and PS in ML models resulted in further improvement in prediction accuracy over PS with a mean AUC improvement of 0.050 (P < .001, comparison 2); the AUC was > 0.80 at all assessed time points for models incorporating clinical text. Exploratory analysis of oncologist's narratives revealed recurring descriptors correlating with survival, including referral patterns, mobility, physical functions, and concomitant medications.

Conclusion: Applying ML to oncologists' narratives with or without including patient's PS significantly improved survival prediction to 12 months, suggesting the utility of clinical text in building prognostic support tools.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electronic Health Records
  • Humans
  • Machine Learning*
  • Neoplasms* / diagnosis
  • Neoplasms* / epidemiology
  • Neoplasms* / therapy
  • Prognosis
  • Retrospective Studies