Using the National Trauma Data Bank (NTDB) and machine learning to predict trauma patient mortality at admission

PLoS One. 2020 Nov 17;15(11):e0242166. doi: 10.1371/journal.pone.0242166. eCollection 2020.

Abstract

A 400-estimator gradient boosting classifier was trained to predict survival probabilities of trauma patients. The National Trauma Data Bank (NTDB) provided 799233 complete patient records (778303 survivors and 20930 deaths) each containing 32 features, a number further reduced to only 8 features via the permutation importance method. Importantly, the 8 features can all be readily determined at admission: systolic blood pressure, heart rate, respiratory rate, temperature, oxygen saturation, gender, age and Glasgow coma score. Since death was rare, a rebalanced training set was used to train the model. The model is able to predict a survival probability for any trauma patient and accurately distinguish between a deceased and survived patient in 92.4% of all cases. Partial dependence curves (Psurvival vs. feature value) obtained from the trained model revealed the global importance of Glasgow coma score, age, and systolic blood pressure while pulse rate, respiratory rate, temperature, oxygen saturation, and gender had more subtle single variable influences. Shapley values, which measure the relative contribution of each of the 8 features to individual patient risk, were computed for several patients and were able to quantify patient-specific warning signs. Using the NTDB to sample across numerous patient traumas and hospital protocols, the trained model and Shapley values rapidly provides quantitative insight into which combination of variables in an 8-dimensional space contributed most to each trauma patient's predicted global risk of death upon emergency room admission.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual*
  • False Positive Reactions
  • Glasgow Coma Scale
  • Heart Rate
  • Hospitalization*
  • Humans
  • Injury Severity Score
  • Machine Learning*
  • Probability
  • ROC Curve
  • Reproducibility of Results
  • Risk
  • Sex Factors
  • Wounds and Injuries / mortality*