Machine learning approaches to investigate Clostridioides difficile infection and outcomes: A systematic review

Int J Med Inform. 2022 Apr:160:104706. doi: 10.1016/j.ijmedinf.2022.104706. Epub 2022 Jan 31.

Abstract

Objectives: Machine learning (ML) has been increasingly used in clinical medicine including studies focused on Clostridioides difficile infection (CDI) to inform to clinical decision making. We aimed to summarize ML choices in studies that used ML to predict CDI or CDI outcomes.

Methods: We searched Ovid MEDLINE, Ovid EMBASE, Web of Science, medRxiv, bioRxiv and arXiv from inception to March 18, 2021. We included fully published studies that used ML where CDI constituted the study population, exposure or outcome. Two reviewers independently identified studies and abstracted outcomes. We summarized study characteristics and approaches to CDI definition and ML-specific modelling.

Results: Forty-three studies of prediction (n = 21), classification (n = 17) or inference (n = 5) were included. Approaches to defining CDI were labelling during a clinical study or chart review (n = 21), electronic phenotyping (n = 13) or not specified (n = 9). None of the studies using an electronic phenotype described phenotype validation. Almost all studies (n = 41, 95%) conducted supervised ML and the most common ML algorithms were penalized logistic regression (n = 20, 47%) and classification tree (n = 17, 40%). Approaches to feature selection and dimension reduction were heterogeneous. Metrics were evaluated in a held-out test set in 16 (37%) studies; only seven used a time-based split. In terms of reporting quality assessment, the most poorly reported items were data leakage prevention (n = 0, 0%), code availability (n = 8, 19%) and class imbalance management (n = 12, 43%).

Conclusions: While many studies have used ML to investigate CDI or CDI outcomes, electronic phenotyping of CDI was uncommon and phenotype validation was not reported in any study. Methodological approaches were heterogeneous. Validating CDI electronic phenotypes, evaluating performances of CDI models during a silent trial and deploying a CDI classifier to guide clinical practice are important future goals.

Keywords: Clostridioides difficile Infection; Electronic health record; Machine learning; Systematic review.

Publication types

  • Review
  • Systematic Review

MeSH terms

  • Clostridioides difficile*
  • Clostridium Infections* / diagnosis
  • Clostridium Infections* / drug therapy
  • Clostridium Infections* / epidemiology
  • Forecasting
  • Humans
  • Logistic Models
  • Machine Learning