Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably

J Clin Epidemiol. 2021 May:133:43-52. doi: 10.1016/j.jclinepi.2020.12.018. Epub 2021 Jan 22.

Abstract

Objective: The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.

Study design and setting: We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.

Results: One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.

Conclusion: In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.

Keywords: Big data; Comorbidity; Electronic health records; International Classification of Diseases; Machine learning; Prognosis; Regression analysis.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Age Factors
  • Aged
  • Aged, 80 and over
  • Colorectal Neoplasms / epidemiology
  • Colorectal Neoplasms / mortality*
  • Electronic Health Records / statistics & numerical data
  • England / epidemiology
  • Female
  • Forecasting
  • Hip Fractures / epidemiology
  • Hip Fractures / mortality*
  • Humans
  • International Classification of Diseases*
  • Logistic Models*
  • Machine Learning*
  • Male
  • Middle Aged
  • Mortality / trends*
  • Myocardial Infarction / epidemiology
  • Myocardial Infarction / mortality*
  • Sex Factors