Diagnostic suspicion bias and machine learning: Breaking the awareness deadlock for sepsis detection

PLOS Digit Health. 2023 Nov 1;2(11):e0000365. doi: 10.1371/journal.pdig.0000365. eCollection 2023 Nov.

Abstract

Many early warning algorithms are downstream of clinical evaluation and diagnostic testing, which means that they may not be useful when clinicians fail to suspect illness and fail to order appropriate tests. Depending on how such algorithms handle missing data, they could even indicate "low risk" simply because the testing data were never ordered. We considered predictive methodologies to identify sepsis at triage, before diagnostic tests are ordered, in a busy Emergency Department (ED). One algorithm used "bland clinical data" (data available at triage for nearly every patient). The second algorithm added three yes/no questions to be answered after the triage interview. Retrospectively, we studied adult patients from a single ED between 2014-16, separated into training (70%) and testing (30%) cohorts, and a final validation cohort of patients from four EDs between 2016-2018. Sepsis was defined per the Rhee criteria. Investigational predictors were demographics and triage vital signs (downloaded from the hospital EMR); past medical history; and the auxiliary queries (answered by chart reviewers who were blinded to all data except the triage note and initial HPI). We developed L2-regularized logistic regression models using a greedy forward feature selection. There were 1164, 499, and 784 patients in the training, testing, and validation cohorts, respectively. The bland clinical data model yielded ROC AUC's 0.78 (0.76-0.81) and 0.77 (0.73-0.81), for training and testing, respectively, and ranged from 0.74-0.79 in four hospital validation. The second model which included auxiliary queries yielded 0.84 (0.82-0.87) and 0.83 (0.79-0.86), and ranged from 0.78-0.83 in four hospital validation. The first algorithm did not require clinician input but yielded middling performance. The second showed a trend towards superior performance, though required additional user effort. These methods are alternatives to predictive algorithms downstream of clinical evaluation and diagnostic testing. For hospital early warning algorithms, consideration should be given to bias and usability of various methods.

Grants and funding

This work was supported in part by a National Defense Science and Engineering Graduate Fellowship (to VP), by the MIT-MGH Strategic Grand Challenge Partnership (to ATR, MRF, and TH), and by grants from the CRICO Risk Management Foundation (to ATR, MRF) and Nihon Kohden Corporation (to ATR, MRF, and TH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.