Predicting survival after radical prostatectomy: Variation of machine learning performance by race

Madhur Nayan; Keyan Salari; Anthony Bozzo; Wolfgang Ganglberger; Filipe Carvalho; Adam S Feldman; Quoc-Dien Trinh

doi:10.1002/pros.24233

Predicting survival after radical prostatectomy: Variation of machine learning performance by race

Prostate. 2021 Dec;81(16):1355-1364. doi: 10.1002/pros.24233. Epub 2021 Sep 16.

Authors

Madhur Nayan¹, Keyan Salari^{1

2}, Anthony Bozzo³, Wolfgang Ganglberger⁴, Filipe Carvalho¹, Adam S Feldman¹, Quoc-Dien Trinh⁵

Affiliations

¹ Department of Urology, Massachusetts General Hospital, Boston, Massachusetts, USA.
² Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
³ Division of Orthopaedic Surgery, Department of Surgery, McMaster University, Hamilton, Ontario, Canada.
⁴ Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.
⁵ Department of Urology, Brigham and Women's Hospital, Boston, Massachusetts, USA.

PMID: 34529282
DOI: 10.1002/pros.24233

Abstract

Background: Robust prediction of survival can facilitate clinical decision-making and patient counselling. Non-Caucasian males are underrepresented in most prostate cancer databases. We evaluated the variation in performance of a machine learning (ML) algorithm trained to predict survival after radical prostatectomy in race subgroups.

Methods: We used the National Cancer Database (NCDB) to identify patients undergoing radical prostatectomy between 2004 and 2016. We grouped patients by race into Caucasian, African-American, or non-Caucasian, non-African-American (NCNAA) subgroups. We trained an Extreme Gradient Boosting (XGBoost) classifier to predict 5-year survival in different training samples: naturally race-imbalanced, race-specific, and synthetically race-balanced. We evaluated performance in the test sets.

Results: A total of 68,630 patients met inclusion criteria. Of these, 57,635 (84%) were Caucasian, 8173 (12%) were African-American, and 2822 (4%) were NCNAA. For the classifier trained in the naturally race-imbalanced sample, the F1 scores were 0.514 (95% confidence interval: 0.513-0.511), 0.511 (0.511-0.512), 0.545 (0.541-0.548), and 0.378 (0.378-0.389) in the race-imbalanced, Caucasian, African-American, and NCNAA test samples, respectively. For all race subgroups, the F1 scores of classifiers trained in the race-specific or synthetically race-balanced samples demonstrated similar performance compared to training in the naturally race-imbalanced sample.

Conclusions: A ML algorithm trained using NCDB data to predict survival after radical prostatectomy demonstrates variation in performance by race, regardless of whether the algorithm is trained in a naturally race-imbalanced, race-specific, or synthetically race-balanced sample. These results emphasize the importance of thoroughly evaluating ML algorithms in race subgroups before clinical deployment to avoid potential disparities in care.

Keywords: machine learning; prostatectomy; prostatic neoplasms; race; survival.

MeSH terms

Algorithms
Clinical Decision-Making
Ethnicity / statistics & numerical data
Humans
Machine Learning
Male
Middle Aged
Prognosis
Prostate* / pathology
Prostate* / surgery
Prostatectomy* / adverse effects
Prostatectomy* / methods
Prostatectomy* / statistics & numerical data
Prostatic Neoplasms* / mortality
Prostatic Neoplasms* / pathology
Prostatic Neoplasms* / surgery
Risk Assessment* / ethnology
Risk Assessment* / methods
Risk Factors
Survival Analysis
United States / epidemiology