Predicting Bacteremia among Septic Patients Based on ED Information by Machine Learning Methods: A Comparative Study

Diagnostics (Basel). 2022 Oct 15;12(10):2498. doi: 10.3390/diagnostics12102498.

Abstract

Introduction: Bacteremia is a common but life-threatening infectious disease. However, a well-defined rule to assess patient risk of bacteremia and the urgency of blood culture is lacking. The aim of this study is to establish a predictive model for bacteremia in septic patients using available big data in the emergency department (ED) through logistic regression and other machine learning (ML) methods.

Material and methods: We conducted a retrospective cohort study at the ED of National Cheng Kung University Hospital in Taiwan from January 2015 to December 2019. ED adults (≥18 years old) with systemic inflammatory response syndrome and receiving blood cultures during the ED stay were included. Models I and II were established based on logistic regression, both of which were derived from support vector machine (SVM) and random forest (RF). Net reclassification index was used to determine which model was superior.

Results: During the study period, 437,969 patients visited the study ED, and 40,395 patients were enrolled. Patients diagnosed with bacteremia accounted for 7.7% of the cohort. The area under the receiver operating curve (AUROC) in models I and II was 0.729 (95% CI, 0.718-0.740) and 0.731 (95% CI, 0.721-0.742), with Akaike information criterion (AIC) of 16,840 and 16,803, respectively. The performance of model II was superior to that of model I. The AUROC values of models III and IV in the validation dataset were 0.730 (95% CI, 0.713-0.747) and 0.705 (0.688-0.722), respectively. There is no statistical evidence to support that the performance of the model created with logistic regression is superior to those created by SVM and RF.

Discussion: The advantage of the SVM or RF model is that the prediction model is more elastic and not limited to a linear relationship. The advantage of the LR model is that it is easy to explain the influence of the independent variable on the response variable. These models could help medical staff identify high-risk patients and prevent unnecessary antibiotic use. The performance of SVM and RF was not inferior to that of logistic regression.

Conclusions: We established models that provide discrimination in predicting bacteremia among patients with sepsis. The reported results could inspire researchers to adopt ML in their development of prediction algorithms.

Keywords: bacteremia; blood culture; logistic regression; machine learning; net reclassification index.