A Deep Learning Application of Capsule Endoscopic Gastric Structure Recognition Based on a Transformer Model

Qingyuan Li; Weijie Xie; Yusi Wang; Kaiwen Qin; Mei Huang; Tianbao Liu; Zefeiyun Chen; Lu Chen; Lan Teng; Yuxin Fang; Liuhua Ye; Zhenyu Chen; Jie Zhang; Aimin Li; Wei Yang; Side Liu

doi:10.1097/MCG.0000000000001972

A Deep Learning Application of Capsule Endoscopic Gastric Structure Recognition Based on a Transformer Model

J Clin Gastroenterol. 2024 Mar 4. doi: 10.1097/MCG.0000000000001972. Online ahead of print.

Authors

Qingyuan Li¹, Weijie Xie^{2

3}, Yusi Wang¹, Kaiwen Qin¹, Mei Huang¹, Tianbao Liu², Zefeiyun Chen², Lu Chen¹, Lan Teng¹, Yuxin Fang¹, Liuhua Ye⁴, Zhenyu Chen¹, Jie Zhang¹, Aimin Li¹, Wei Yang^{2

4}, Side Liu^{1

4

5}

Affiliations

¹ Guangdong Provincial Key Laboratory of Gastroenterology, Department of Gastroenterology, Nanfang Hospital.
² School of Biomedical Engineering.
³ Department of Information, Guangzhou First People's Hospital, School of Medicine, South China University of Technology.
⁴ Pazhou Lab, Guangzhou, Guangdong.
⁵ Department of Gastroenterology, Zhuhai People's Hospital, Zhuhai Hospital Affiliated with Jinan University, Zhuhai, China.

PMID: 38457410
DOI: 10.1097/MCG.0000000000001972

Abstract

Background: Gastric structure recognition systems have become increasingly necessary for the accurate diagnosis of gastric lesions in capsule endoscopy. Deep learning, especially using transformer models, has shown great potential in the recognition of gastrointestinal (GI) images according to self-attention. This study aims to establish an identification model of capsule endoscopy gastric structures to improve the clinical applicability of deep learning to endoscopic image recognition.

Methods: A total of 3343 wireless capsule endoscopy videos collected at Nanfang Hospital between 2011 and 2021 were used for unsupervised pretraining, while 2433 were for training and 118 were for validation. Fifteen upper GI structures were selected for quantifying the examination quality. We also conducted a comparison of the classification performance between the artificial intelligence model and endoscopists by the accuracy, sensitivity, specificity, and positive and negative predictive values.

Results: The transformer-based AI model reached a relatively high level of diagnostic accuracy in gastric structure recognition. Regarding the performance of identifying 15 upper GI structures, the AI model achieved a macroaverage accuracy of 99.6% (95% CI: 99.5-99.7), a macroaverage sensitivity of 96.4% (95% CI: 95.3-97.5), and a macroaverage specificity of 99.8% (95% CI: 99.7-99.9) and achieved a high level of interobserver agreement with endoscopists.

Conclusions: The transformer-based AI model can accurately evaluate the gastric structure information of capsule endoscopy with the same performance as that of endoscopists, which will provide tremendous help for doctors in making a diagnosis from a large number of images and improve the efficiency of examination.