Multicenter validation of a machine-learning algorithm for 48-h all-cause mortality prediction

Hamid Mohamadlou; Saarang Panchavati; Jacob Calvert; Anna Lynn-Palevsky; Sidney Le; Angier Allen; Emily Pellegrini; Abigail Green-Saxena; Christopher Barton; Grant Fletcher; Lisa Shieh; Philip B Stark; Uli Chettipally; David Shimabukuro; Mitchell Feldman; Ritankar Das

doi:10.1177/1460458219894494

Multicenter validation of a machine-learning algorithm for 48-h all-cause mortality prediction

Health Informatics J. 2020 Sep;26(3):1912-1925. doi: 10.1177/1460458219894494. Epub 2019 Dec 30.

Affiliations

¹ Dascena, Inc., USA.
² University of California San Francisco, USA.
³ University of Washington, USA.
⁴ Stanford University, USA.
⁵ University of California, Berkeley, USA.
⁶ University of California San Francisco, USA; Kaiser Permanente South San Francisco Medical Center, USA.

PMID: 31884847
DOI: 10.1177/1460458219894494

Abstract

In order to evaluate mortality predictions based on boosted trees, this retrospective study uses electronic medical record data from three academic health centers for inpatients 18 years or older with at least one observation of each vital sign. Predictions were made 12, 24, and 48 hours before death. Models fit to training data from each institution were evaluated using hold-out test data from the same institution, and from the other institutions. Gradient-boosted trees (GBT) were compared to regularized logistic regression (LR) predictions, support vector machine (SVM) predictions, quick Sepsis-Related Organ Failure Assessment (qSOFA), and Modified Early Warning Score (MEWS) using area under the receiver operating characteristic curve (AUROC). For training and testing GBT on data from the same institution, the average AUROCs were 0.96, 0.95, and 0.94 across institutional test sets for 12-, 24-, and 48-hour predictions, respectively. When trained and tested on data from different hospitals, GBT AUROCs achieved up to 0.98, 0.96, and 0.96, for 12-, 24-, and 48-hour predictions, respectively. Average AUROC for 48-hour predictions for LR, SVM, MEWS, and qSOFA were 0.85, 0.79, 0.86 and 0.82, respectively. GBT predictions may help identify patients who would benefit from increased clinical care.

Keywords: electronic health record; machine learning; mortality; prediction.

Publication types

Multicenter Study
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Hospital Mortality
Humans
Machine Learning*
Retrospective Studies
Sepsis*

Grants and funding

R43 NR015945/NR/NINR NIH HHS/United States