Central Reading of Ulcerative Colitis Clinical Trial Videos Using Neural Networks

Gastroenterology. 2021 Feb;160(3):710-719.e2. doi: 10.1053/j.gastro.2020.10.024. Epub 2020 Oct 21.

Abstract

Background and aims: Endoscopic disease activity scoring in ulcerative colitis (UC) is useful in clinical practice but done infrequently. It is required in clinical trials, where it is expensive and slow because human central readers are needed. A machine learning algorithm automating the process could elevate clinical care and facilitate clinical research. Prior work using single-institution databases and endoscopic still images has been promising.

Methods: Seven hundred and ninety-five full-length endoscopy videos were prospectively collected from a phase 2 trial of mirikizumab with 249 patients from 14 countries, totaling 19.5 million image frames. Expert central readers assigned each full-length endoscopy videos 1 endoscopic Mayo score (eMS) and 1 Ulcerative Colitis Endoscopic Index of Severity (UCEIS) score. Initially, video data were cleaned and abnormality features extracted using convolutional neural networks. Subsequently, a recurrent neural network was trained on the features to predict eMS and UCEIS from individual full-length endoscopy videos.

Results: The primary metric to assess the performance of the recurrent neural network model was quadratic weighted kappa (QWK) comparing the agreement of the machine-read endoscopy score with the human central reader score. QWK progressively penalizes disagreements that exceed 1 level. The model's agreement metric was excellent, with a QWK of 0.844 (95% confidence interval, 0.787-0.901) for eMS and 0.855 (95% confidence interval, 0.80-0.91) for UCEIS.

Conclusions: We found that a deep learning algorithm can be trained to predict levels of UC severity from full-length endoscopy videos. Our data set was prospectively collected in a multinational clinical trial, videos rather than still images were used, UCEIS and eMS were reported, and machine learning algorithm performance metrics met or exceeded those previously published for UC severity scores.

Keywords: Computer Vision; Efficacy End Points; Endoscopic Scores; Machine Learning.

Publication types

  • Clinical Trial, Phase II
  • Comparative Study
  • Multicenter Study
  • Randomized Controlled Trial
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Antibodies, Monoclonal, Humanized / administration & dosage*
  • Antibodies, Monoclonal, Humanized / adverse effects
  • Colitis, Ulcerative / diagnosis*
  • Colitis, Ulcerative / drug therapy
  • Colon / diagnostic imaging
  • Colon / drug effects
  • Colonoscopy / methods*
  • Deep Learning*
  • Feasibility Studies
  • Female
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Intestinal Mucosa / diagnostic imaging
  • Intestinal Mucosa / drug effects
  • Male
  • Middle Aged
  • Observer Variation
  • Predictive Value of Tests
  • Prospective Studies
  • Severity of Illness Index
  • Treatment Outcome
  • Video Recording
  • Young Adult

Substances

  • Antibodies, Monoclonal, Humanized
  • mirikizumab