Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images

Comput Biol Med. 2023 May:157:106723. doi: 10.1016/j.compbiomed.2023.106723. Epub 2023 Mar 5.

Abstract

Despite being widely utilized to help endoscopists identify gastrointestinal (GI) tract diseases using classification and segmentation, models based on convolutional neural network (CNN) have difficulties in distinguishing the similarities among some ambiguous types of lesions presented in endoscopic images, and in the training when lacking labeled datasets. Those will prevent CNN from further improving the accuracy of diagnosis. To address these challenges, we first proposed a Multi-task Network (TransMT-Net) capable of simultaneously learning two tasks (classification and segmentation), which has the transformer designed to learn global features and can combine the advantages of CNN in learning local features so that to achieve a more accurate prediction in identifying the lesion types and regions in GI tract endoscopic images. We further adopted the active learning in TransMT-Net to tackle the labeled image-hungry problem. A dataset was created from the CVC-ClinicDB dataset, Macau Kiang Wu Hospital, and Zhongshan Hospital to evaluate the model performance. Then, the experimental results show that our model not only achieved 96.94% accuracy in the classification task and 77.76% Dice Similarity Coefficient in the segmentation task but also outperformed those of other models on our test set. Meanwhile, active learning also produced positive results for the performance of our model with a small-scale initial training set, and even its performance with 30% of the initial training set was comparable to that of most comparable models with the full training set. Consequently, the proposed TransMT-Net has demonstrated its potential performance in GI tract endoscopic images and it through active learning can alleviate the shortage of labeled images.

Keywords: Active learning; Classification; Multi-task learning; Segmentation; Transformer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Endoscopy, Gastrointestinal
  • Gastrointestinal Tract / diagnostic imaging
  • Image Processing, Computer-Assisted* / methods
  • Neural Networks, Computer*