Regression and Random Forest Machine Learning Have Limited Performance in Predicting Bowel Preparation in Veteran Population

Dig Dis Sci. 2022 Jul;67(7):2827-2841. doi: 10.1007/s10620-021-07113-z. Epub 2021 Jun 24.

Abstract

Background: Inadequate bowel preparation undermines the quality of colonoscopy, but patients likely to be affected are difficult to identify beforehand.

Aims: This study aimed to develop, validate, and compare prediction models for bowel preparation inadequacy using conventional logistic regression (LR) and random forest machine learning (RFML).

Methods: We created a retrospective cohort of patients who underwent outpatient colonoscopy at a single VA medical center between January 2012 and October 2015. Candidate predictor variables were chosen after a literature review. We extracted all available predictor variables from the electronic medical record, and bowel preparation from the endoscopy database. The data were split into 70% training and 30% validation sets. Multivariable LR and RFML were used to predict preparation inadequacy as a dichotomous outcome.

Results: The cohort included 6,885 Veterans, of whom 964 (14%) had inadequate preparation. Using LR, the area under the receiver operating characteristic curve (AUC) for the validation cohort was 0.66 (95% CI 0.62, 0.69) and the Brier score, in which a lower score indicates better performance, was 0.11. Using RFML, the AUC for the validation cohort was 0.61 (95% CI 0.58, 0.65) and the Brier score was 0.12.

Conclusions: LR and RFML had similar performance in predicting bowel preparation, which was modest and likely insufficient for use in practice. Future research is needed to identify additional predictor variables and to test other machine learning algorithms. At present, endoscopy units should focus on universal strategies to enhance preparation adequacy.

Keywords: Bowel preparation; Colonoscopy; Healthcare quality; Prediction models; Random forest machine learning; Veterans health.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Humans
  • Logistic Models
  • Machine Learning
  • Retrospective Studies
  • Risk Assessment
  • Veterans*