Automated identification and assignment of colonoscopy surveillance recommendations for individuals with colorectal polyps

Gastrointest Endosc. 2021 Nov;94(5):978-987. doi: 10.1016/j.gie.2021.05.036. Epub 2021 Jun 1.

Abstract

Background and aims: Determining surveillance intervals for patients with colorectal polyps is critical but time-consuming and challenging to do reliably. We present the development and assessment of a pipeline that leverages natural language processing techniques to automatically extract and analyze relevant polyp findings from free-text colonoscopy and pathology reports. Using this information, we categorized individual patients into 6 postcolonoscopy surveillance intervals defined by the U.S. Multi-Society Task Force on Colorectal Cancer.

Methods: Using a set of 546 randomly selected colonoscopy and pathology reports from 324 patients in a single health system, we used a combination of statistical classifiers and rule-based methods to extract polyp properties from each report type, associate properties with unique polyps, and classify a patient into 1 of 6 risk categories by integrating information from both report types. We then assessed the pipeline's performance by determining the positive predictive value (PPV), sensitivity, and F-score of the algorithm, compared with the determination of surveillance intervals by a gastroenterologist.

Results: The pipeline was developed using 346 reports (224 colonoscopy and 122 pathology) from 224 patients and evaluated on an independent test set of 200 reports (100 colonoscopy and 100 pathology) from 100 patients. We achieved an average PPV, sensitivity, and F-score of .92, .95, and .93, respectively, across targeted entities for colonoscopy. Pathology extraction achieved a PPV, sensitivity, and F-score of .95, .97, and .96. The system achieved an overall accuracy of 92% in assigning the recommended interval for surveillance colonoscopy.

Conclusions: This study demonstrates the feasibility of using machine learning to automatically extract findings and classify patients to appropriate risk categories and corresponding surveillance intervals. Incorporating this system can facilitate proactive and timely follow-up after screening colonoscopy and enable real-time quality assessment of prevention programs and providers.

MeSH terms

  • Colonic Polyps* / diagnostic imaging
  • Colonoscopy
  • Colorectal Neoplasms* / diagnosis
  • Gastroenterologists*
  • Humans
  • Mass Screening
  • Natural Language Processing