Validation of a Machine Learning Model That Outperforms Clinical Risk Scoring Systems for Upper Gastrointestinal Bleeding

Gastroenterology. 2020 Jan;158(1):160-167. doi: 10.1053/j.gastro.2019.09.009. Epub 2019 Sep 25.

Abstract

Background & aims: Scoring systems are suboptimal for determining risk in patients with upper gastrointestinal bleeding (UGIB); these might be improved by a machine learning model. We used machine learning to develop a model to calculate the risk of hospital-based intervention or death in patients with UGIB and compared its performance with other scoring systems.

Methods: We analyzed data collected from consecutive unselected patients with UGIB from medical centers in 4 countries (the United States, Scotland, England, and Denmark; n = 1958) from March 2014 through March 2015. We used the data to derive and internally validate a gradient-boosting machine learning model to identify patients who met a composite endpoint of hospital-based intervention (transfusion or hemostatic intervention) or death within 30 days. We compared the performance of the machine learning prediction model with validated pre-endoscopic clinical risk scoring systems (the Glasgow-Blatchford score [GBS], admission Rockall score, and AIMS65). We externally validated the machine learning model using data from 2 Asia-Pacific sites (Singapore and New Zealand; n = 399). Performance was measured by area under receiver operating characteristic curve (AUC) analysis.

Results: The machine learning model identified patients who met the composite endpoint with an AUC of 0.91 in the internal validation set; the clinical scoring systems identified patients who met the composite endpoint with AUC values of 0.88 for the GBS (P = .001), 0.73 for Rockall score (P < .001), and 0.78 for AIMS65 score (P < .001). In the external validation cohort, the machine learning model identified patients who met the composite endpoint with an AUC of 0.90, the GBS with an AUC of 0.87 (P = .004), the Rockall score with an AUC of 0.66 (P < .001), and the AIMS65 with an AUC of 0.64 (P < .001). At cutoff scores at which the machine learning model and GBS identified patients who met the composite endpoint with 100% sensitivity, the specificity values were 26% with the machine learning model versus 12% with GBS (P < .001).

Conclusions: We developed a machine learning model that identifies patients with UGIB who met a composite endpoint of hospital-based intervention or death within 30 days with a greater AUC and higher levels of specificity, at 100% sensitivity, than validated clinical risk scoring systems. This model could increase identification of low-risk patients who can be safely discharged from the emergency department for outpatient management.

Keywords: Artificial Intelligence; Mortality; Prediction; Prognostic Factor.

Publication types

  • Research Support, N.I.H., Extramural
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Blood Transfusion / statistics & numerical data
  • Emergency Service, Hospital / statistics & numerical data
  • Female
  • Gastrointestinal Hemorrhage / diagnosis*
  • Gastrointestinal Hemorrhage / therapy
  • Hemostatic Techniques / statistics & numerical data
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Models, Biological*
  • Prognosis
  • ROC Curve
  • Risk Assessment / methods