Context-aware and local-aware fusion with transformer for medical image segmentation

Phys Med Biol. 2024 Jan 10;69(2). doi: 10.1088/1361-6560/ad14c6.

Abstract

Objective. Convolutional neural networks (CNNs) have made significant progress in medical image segmentation tasks. However, for complex segmentation tasks, CNNs lack the ability to establish long-distance relationships, resulting in poor segmentation performance. The characteristics of intra-class diversity and inter-class similarity in images increase the difficulty of segmentation. Additionally, some focus areas exhibit a scattered distribution, making segmentation even more challenging.Approach. Therefore, this work proposed a new Transformer model, FTransConv, to address the issues of inter-class similarity, intra-class diversity, and scattered distribution in medical image segmentation tasks. To achieve this, three Transformer-CNN modules were designed to extract global and local information, and a full-scale squeeze-excitation module was proposed in the decoder using the idea of full-scale connections.Main results. Without any pre-training, this work verified the effectiveness of FTransConv on three public COVID-19 CT datasets and MoNuSeg. Experiments have shown that FTransConv, which has only 26.98M parameters, outperformed other state-of-the-art models, such as Swin-Unet, TransAttUnet, UCTransNet, LeViT-UNet, TransUNet, UTNet, and SAUNet++. This model achieved the best segmentation performance with a DSC of 83.22% in COVID-19 datasets and 79.47% in MoNuSeg.Significance. This work demonstrated that our method provides a promising solution for regions with high inter-class similarity, intra-class diversity and scatter distribution in image segmentation.

Keywords: context-aware information; local-aware information; medical image segmentation; transformer.

MeSH terms

  • COVID-19* / diagnostic imaging
  • Humans
  • Image Processing, Computer-Assisted
  • Neural Networks, Computer