Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Sodiq Adewole; Michelle Yeghyayan; Dylan Hyatt; Lubaina Ehsan; James Jablonski; Andrew Copland; Sana Syed; Donald Brown

doi:10.1007/978-3-030-63128-4_32

Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Proc Future Technol Conf (2020). 2021 Nov:1288:426-434. doi: 10.1007/978-3-030-63128-4_32. Epub 2020 Oct 31.

Authors

Sodiq Adewole¹, Michelle Yeghyayan², Dylan Hyatt², Lubaina Ehsan², James Jablonski¹, Andrew Copland², Sana Syed², Donald Brown^{1

3}

Affiliations

¹ Department of Systems and Information Engineering, University of Virginia, Charlottesville, VA, USA.
² Department of Pediatrics, School of Medicine, University of Virginia, Charlottesville, VA, USA.
³ School of Data Science, University of Virginia, Charlottesville, VA, USA.

Abstract

Video capsule endoscope (VCE) is an emerging technology that allows examination of the entire gastrointestinal (GI) tract with minimal invasion. While traditional endoscopy with biopsy procedures are the gold standard for diagnosis of most GI diseases, they are limited by how far the scope can be advanced in the tract and are also invasive. VCE allows gastroenterologists to investigate GI tract abnormalities in detail with visualization of all parts of the GI tract. It captures continuous real time images as it is propelled in the GI tract by gut motility. Even though VCE allows for thorough examination, reviewing and analyzing up to eight hours of images (compiled as videos) is tedious and not cost effective. In order to pave way for automation of VCE-based GI disease diagnosis, detecting the location of the capsule would allow for a more focused analysis as well as abnormality detection in each region of the GI tract. In this paper, we compared four deep Convolutional Neural Network models for feature extraction and detection of the anatomical part within the GI tract captured by VCE images. Our results showed that VGG-Net has superior performance with the highest average accuracy, precision, recall and, F1-score compared to other state of the art architectures: GoogLeNet, AlexNet and, ResNet.

Keywords: AlexNet; Convolutional neural network; Gastrointestinal tract; GoogLeNet; Gradient-weighted class activation mapping (Grad-CAM); ResNet; VGG-net; Video capsule endoscopy.

Grants and funding

K23 DK117061/DK/NIDDK NIH HHS/United States