Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data

Bocheng Jing; W John Boscardin; W James Deardorff; Sun Young Jeon; Alexandra K Lee; Anne L Donovan; Sei J Lee

doi:10.1097/MLR.0000000000001720

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data

Med Care. 2022 Jun 1;60(6):470-479. doi: 10.1097/MLR.0000000000001720. Epub 2022 Mar 30.

Authors

Bocheng Jing^{1

2

3}, W John Boscardin^{1

3

4}, W James Deardorff³, Sun Young Jeon^{1

3}, Alexandra K Lee^{1

3}, Anne L Donovan⁵, Sei J Lee^{1

3}

Affiliations

¹ San Francisco VA Health Care System.
² Northern California Institute for Research and Education.
³ Division of Geriatrics.
⁴ Departments of Epidemiology and Biostatistics.
⁵ Anesthesia and Perioperative Medicine, University of California, San Francisco, San Francisco, CA.

Abstract

Background: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.

Objective: The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.

Design: This was a cohort study.

Setting: Veterans Affairs (VA) EHR data.

Participants: Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each).

Measurements and analytic methods: The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models.

Results: Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics.

Limitation: Our results should be confirmed in non-VA EHRs.

Conclusion: The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Cohort Studies
Electronic Health Records*
Female
Humans
Machine Learning
Male
Regression Analysis
Veterans*

Abstract

Publication types

MeSH terms

Grants and funding