Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images

Suigu Tang; Xiaoyuan Yu; Chak Fong Cheang; Yanyan Liang; Penghui Zhao; Hon Ho Yu; I Cheong Choi

doi:10.1016/j.compbiomed.2023.106723

Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images

Comput Biol Med. 2023 May:157:106723. doi: 10.1016/j.compbiomed.2023.106723. Epub 2023 Mar 5.

Authors

Suigu Tang¹, Xiaoyuan Yu¹, Chak Fong Cheang², Yanyan Liang¹, Penghui Zhao¹, Hon Ho Yu³, I Cheong Choi³

Affiliations

¹ Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China.
² Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China. Electronic address: cfcheang@must.edu.mo.
³ Kiang Wu Hospital, Macao Special Administrative Region of China.

PMID: 36907035
DOI: 10.1016/j.compbiomed.2023.106723

Abstract

Despite being widely utilized to help endoscopists identify gastrointestinal (GI) tract diseases using classification and segmentation, models based on convolutional neural network (CNN) have difficulties in distinguishing the similarities among some ambiguous types of lesions presented in endoscopic images, and in the training when lacking labeled datasets. Those will prevent CNN from further improving the accuracy of diagnosis. To address these challenges, we first proposed a Multi-task Network (TransMT-Net) capable of simultaneously learning two tasks (classification and segmentation), which has the transformer designed to learn global features and can combine the advantages of CNN in learning local features so that to achieve a more accurate prediction in identifying the lesion types and regions in GI tract endoscopic images. We further adopted the active learning in TransMT-Net to tackle the labeled image-hungry problem. A dataset was created from the CVC-ClinicDB dataset, Macau Kiang Wu Hospital, and Zhongshan Hospital to evaluate the model performance. Then, the experimental results show that our model not only achieved 96.94% accuracy in the classification task and 77.76% Dice Similarity Coefficient in the segmentation task but also outperformed those of other models on our test set. Meanwhile, active learning also produced positive results for the performance of our model with a small-scale initial training set, and even its performance with 30% of the initial training set was comparable to that of most comparable models with the full training set. Consequently, the proposed TransMT-Net has demonstrated its potential performance in GI tract endoscopic images and it through active learning can alleviate the shortage of labeled images.

Keywords: Active learning; Classification; Multi-task learning; Segmentation; Transformer.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Endoscopy, Gastrointestinal
Gastrointestinal Tract / diagnostic imaging
Image Processing, Computer-Assisted* / methods
Neural Networks, Computer*