Impact of study design on adenoma detection in the evaluation of artificial intelligence-aided colonoscopy: a systematic review and meta-analysis

Gastrointest Endosc. 2024 May;99(5):676-687.e16. doi: 10.1016/j.gie.2024.01.021. Epub 2024 Jan 24.

Abstract

Background and aims: Randomized controlled trials (RCTs) have reported that artificial intelligence (AI) improves endoscopic polyp detection. Different methodologies-namely, parallel and tandem designs-have been used to evaluate the efficacy of AI-assisted colonoscopy in RCTs. Systematic reviews and meta-analyses have reported a pooled effect that includes both study designs. However, it is unclear whether there are inconsistencies in the reported results of these 2 designs. Here, we aimed to determine whether study characteristics moderate between-trial differences in outcomes when evaluating the effectiveness of AI-assisted polyp detection.

Methods: A systematic search of Ovid MEDLINE, Embase, Cochrane Central, Web of Science, and IEEE Xplore was performed through March 1, 2023, for RCTs comparing AI-assisted colonoscopy with routine high-definition colonoscopy in polyp detection. The primary outcome of interest was the impact of study type on the adenoma detection rate (ADR). Secondary outcomes included the impact of the study type on adenomas per colonoscopy and withdrawal time, as well as the impact of geographic location, AI system, and endoscopist experience on ADR. Pooled event analysis was performed using a random-effects model.

Results: Twenty-four RCTs involving 17,413 colonoscopies (AI assisted: 8680; non-AI assisted: 8733) were included. AI-assisted colonoscopy improved overall ADR (risk ratio [RR], 1.24; 95% confidence interval [CI], 1.17-1.31; I2 = 53%; P < .001). Tandem studies collectively demonstrated improved ADR in AI-aided colonoscopies (RR, 1.18; 95% CI, 1.08-1.30; I2 = 0%; P < .001), as did parallel studies (RR, 1.26; 95% CI, 1.17-1.35; I2 = 62%; P < .001), with no statistical subgroup difference between study design. Both tandem and parallel study designs revealed improvement in adenomas per colonoscopy in AI-aided colonoscopies, but this improvement was more marked among tandem studies (P < .001). AI assistance significantly increased withdrawal times for parallel (P = .002), but not tandem, studies. ADR improvement was more marked among studies conducted in Asia compared to Europe and North America in a subgroup analysis (P = .007). Type of AI system used or endoscopist experience did not affect overall improvement in ADR.

Conclusions: Either parallel or tandem study design can capture the improvement in ADR resulting from the use of AI-assisted polyp detection systems. Tandem studies powered to detect differences in endoscopic performance through paired comparison may be a resource-efficient method of evaluating new AI-assisted technologies.

Publication types

  • Review