Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database

Changhee Lee; Alexander Light; Ahmed Alaa; David Thurtle; Mihaela van der Schaar; Vincent J Gnanapragasam

doi:10.1016/S2589-7500(20)30314-9

Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database

Lancet Digit Health. 2021 Mar;3(3):e158-e165. doi: 10.1016/S2589-7500(20)30314-9. Epub 2021 Feb 3.

Authors

Changhee Lee¹, Alexander Light², Ahmed Alaa¹, David Thurtle², Mihaela van der Schaar³, Vincent J Gnanapragasam⁴

Affiliations

¹ Department of Electrical and Computer Engineering, University of California, Los Angeles, CA, USA.
² Department of Surgery, Division of Urology, University of Cambridge, Cambridge, UK; Department of Urology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.
³ Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK; Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
⁴ Department of Surgery, Division of Urology, University of Cambridge, Cambridge, UK; Cambridge Urology Translational Research and Clinical Trials Office, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK; Department of Urology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK. Electronic address: vjg29@cam.ac.uk.

PMID: 33549512
DOI: 10.1016/S2589-7500(20)30314-9

Abstract

Background: Accurate prognostication is crucial in treatment decisions made for men diagnosed with non-metastatic prostate cancer. Current models rely on prespecified variables, which limits their performance. We aimed to investigate a novel machine learning approach to develop an improved prognostic model for predicting 10-year prostate cancer-specific mortality and compare its performance with existing validated models.

Methods: We derived and tested a machine learning-based model using Survival Quilts, an algorithm that automatically selects and tunes ensembles of survival models using clinicopathological variables. Our study involved a US population-based cohort of 171 942 men diagnosed with non-metastatic prostate cancer between Jan 1, 2000, and Dec 31, 2016, from the prospectively maintained Surveillance, Epidemiology, and End Results (SEER) Program. The primary outcome was prediction of 10-year prostate cancer-specific mortality. Model discrimination was assessed using the concordance index (c-index), and calibration was assessed using Brier scores. The Survival Quilts model was compared with nine other prognostic models in clinical use, and decision curve analysis was done.

Findings: 647 151 men with prostate cancer were enrolled into the SEER database, of whom 171 942 were included in this study. Discrimination improved with greater granularity, and multivariable models outperformed tier-based models. The Survival Quilts model showed good discrimination (c-index 0·829, 95% CI 0·820-0·838) for 10-year prostate cancer-specific mortality, which was similar to the top-ranked multivariable models: PREDICT Prostate (0·820, 0·811-0·829) and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram (0·787, 0·776-0·798). All three multivariable models showed good calibration with low Brier scores (Survival Quilts 0·036, 95% CI 0·035-0·037; PREDICT Prostate 0·036, 0·035-0·037; MSKCC 0·037, 0·035-0·039). Of the tier-based systems, the Cancer of the Prostate Risk Assessment model (c-index 0·782, 95% CI 0·771-0·793) and Cambridge Prognostic Groups model (0·779, 0·767-0·791) showed higher discrimination for predicting 10-year prostate cancer-specific mortality. c-indices for models from the National Comprehensive Cancer Care Network, Genitourinary Radiation Oncologists of Canada, American Urological Association, European Association of Urology, and National Institute for Health and Care Excellence ranged from 0·711 (0·701-0·721) to 0·761 (0·750-0·772). Discrimination for the Survival Quilts model was maintained when stratified by age and ethnicity. Decision curve analysis showed an incremental net benefit from the Survival Quilts model compared with the MSKCC and PREDICT Prostate models currently used in practice.

Interpretation: A novel machine learning-based approach produced a prognostic model, Survival Quilts, with discrimination for 10-year prostate cancer-specific mortality similar to the top-ranked prognostic models, using only standard clinicopathological variables. Future integration of additional data will likely improve model performance and accuracy for personalised prognostics.

Funding: None.

MeSH terms

Adult
Aged
Aged, 80 and over
Algorithms*
Databases, Factual
Humans
Machine Learning*
Male
Middle Aged
Nomograms
Prognosis
Prostate / pathology*
Prostatic Neoplasms / diagnosis*
Prostatic Neoplasms / mortality
Prostatic Neoplasms / pathology
Retrospective Studies
Risk Assessment
Survival Analysis
United States / epidemiology