Central Reading of Ulcerative Colitis Clinical Trial Videos Using Neural Networks

Klaus Gottlieb; James Requa; William Karnes; Ranga Chandra Gudivada; Jie Shen; Efren Rael; Vipin Arora; Tyler Dao; Andrew Ninh; James McGill

doi:10.1053/j.gastro.2020.10.024

Central Reading of Ulcerative Colitis Clinical Trial Videos Using Neural Networks

Gastroenterology. 2021 Feb;160(3):710-719.e2. doi: 10.1053/j.gastro.2020.10.024. Epub 2020 Oct 21.

Authors

Affiliations

¹ Eli Lilly and Company, Indianapolis, Indiana. Electronic address: klaus.gottlieb@lilly.com.
² Docbot Inc, Irvine, California.
³ Eli Lilly and Company, Indianapolis, Indiana.

PMID: 33098883
DOI: 10.1053/j.gastro.2020.10.024

Abstract

Background and aims: Endoscopic disease activity scoring in ulcerative colitis (UC) is useful in clinical practice but done infrequently. It is required in clinical trials, where it is expensive and slow because human central readers are needed. A machine learning algorithm automating the process could elevate clinical care and facilitate clinical research. Prior work using single-institution databases and endoscopic still images has been promising.

Methods: Seven hundred and ninety-five full-length endoscopy videos were prospectively collected from a phase 2 trial of mirikizumab with 249 patients from 14 countries, totaling 19.5 million image frames. Expert central readers assigned each full-length endoscopy videos 1 endoscopic Mayo score (eMS) and 1 Ulcerative Colitis Endoscopic Index of Severity (UCEIS) score. Initially, video data were cleaned and abnormality features extracted using convolutional neural networks. Subsequently, a recurrent neural network was trained on the features to predict eMS and UCEIS from individual full-length endoscopy videos.

Results: The primary metric to assess the performance of the recurrent neural network model was quadratic weighted kappa (QWK) comparing the agreement of the machine-read endoscopy score with the human central reader score. QWK progressively penalizes disagreements that exceed 1 level. The model's agreement metric was excellent, with a QWK of 0.844 (95% confidence interval, 0.787-0.901) for eMS and 0.855 (95% confidence interval, 0.80-0.91) for UCEIS.

Conclusions: We found that a deep learning algorithm can be trained to predict levels of UC severity from full-length endoscopy videos. Our data set was prospectively collected in a multinational clinical trial, videos rather than still images were used, UCEIS and eMS were reported, and machine learning algorithm performance metrics met or exceeded those previously published for UC severity scores.

Keywords: Computer Vision; Efficacy End Points; Endoscopic Scores; Machine Learning.

Publication types

Clinical Trial, Phase II
Comparative Study
Multicenter Study
Randomized Controlled Trial
Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Aged
Antibodies, Monoclonal, Humanized / administration & dosage*
Antibodies, Monoclonal, Humanized / adverse effects
Colitis, Ulcerative / diagnosis*
Colitis, Ulcerative / drug therapy
Colon / diagnostic imaging
Colon / drug effects
Colonoscopy / methods*
Deep Learning*
Feasibility Studies
Female
Humans
Image Interpretation, Computer-Assisted / methods*
Intestinal Mucosa / diagnostic imaging
Intestinal Mucosa / drug effects
Male
Middle Aged
Observer Variation
Predictive Value of Tests
Prospective Studies
Severity of Illness Index
Treatment Outcome
Video Recording
Young Adult

Substances

Antibodies, Monoclonal, Humanized
mirikizumab