Performance comparison between two computer-aided detection colonoscopy models by trainees using different false positive thresholds: a cross-sectional study in Thailand

Kasenee Tiankanon; Julalak Karuehardsuwan; Satimai Aniwan; Parit Mekaroonkamol; Panukorn Sunthornwechapong; Huttakan Navadurong; Kittithat Tantitanawat; Krittaya Mekritthikrai; Salin Samutrangsi; Peerapon Vateekul; Rungsun Rerknimitr

doi:10.5946/ce.2023.145

Performance comparison between two computer-aided detection colonoscopy models by trainees using different false positive thresholds: a cross-sectional study in Thailand

Clin Endosc. 2024 Mar;57(2):217-225. doi: 10.5946/ce.2023.145. Epub 2024 Feb 7.

Affiliations

¹ Division of Gastroenterology, Department of Medicine, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Thai red cross, Bangkok.
² Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand.

Abstract

Background/aims: This study aims to compare polyp detection performance of "Deep-GI," a newly developed artificial intelligence (AI) model, to a previously validated AI model computer-aided polyp detection (CADe) using various false positive (FP) thresholds and determining the best threshold for each model.

Methods: Colonoscopy videos were collected prospectively and reviewed by three expert endoscopists (gold standard), trainees, CADe (CAD EYE; Fujifilm Corp.), and Deep-GI. Polyp detection sensitivity (PDS), polyp miss rates (PMR), and false-positive alarm rates (FPR) were compared among the three groups using different FP thresholds for the duration of bounding boxes appearing on the screen.

Results: In total, 170 colonoscopy videos were used in this study. Deep-GI showed the highest PDS (99.4% vs. 85.4% vs. 66.7%, p<0.01) and the lowest PMR (0.6% vs. 14.6% vs. 33.3%, p<0.01) when compared to CADe and trainees, respectively. Compared to CADe, Deep-GI demonstrated lower FPR at FP thresholds of ≥0.5 (12.1 vs. 22.4) and ≥1 second (4.4 vs. 6.8) (both p<0.05). However, when the threshold was raised to ≥1.5 seconds, the FPR became comparable (2 vs. 2.4, p=0.3), while the PMR increased from 2% to 10%.

Conclusion: Compared to CADe, Deep-GI demonstrated a higher PDS with significantly lower FPR at ≥0.5- and ≥1-second thresholds. At the ≥1.5-second threshold, both systems showed comparable FPR with increased PMR.

Keywords: Artificial intelligence; Colonoscopy; Computational intelligence; Endoscopy; Polyps.

Performance comparison between two computer-aided detection colonoscopy models by trainees using different false positive thresholds: a cross-sectional study in Thailand

Authors

Affiliations

Abstract

Grants and funding