TCI-UNet: transformer-CNN interactive module for medical image segmentation

Biomed Opt Express. 2023 Oct 23;14(11):5904-5920. doi: 10.1364/BOE.499640. eCollection 2023 Nov 1.

Abstract

Medical image segmentation is a crucial step in developing medical systems, especially for assisting doctors in diagnosing and treating diseases. Currently, UNet has become the preferred network for most medical image segmentation tasks and has achieved tremendous success. However, due to the limitations of convolutional operation mechanisms, its ability to model long-range dependencies between features is limited. With the success of transformers in the computer vision (CV) field, many excellent models that combine transformers with UNet have emerged, but most of them have fixed receptive fields and a single feature extraction method. To address this issue, we propose a transformer-CNN interactive (TCI) feature extraction module and use it to construct TCI-UNet. Specifically, we improve the self-attention mechanism in transformers to enhance the guiding ability of attention maps for computational resource allocation. It can strengthen the network's ability to capture global contextual information from feature maps. Additionally, we introduce local multi-scale information to supplement feature information, allowing the network to focus on important local information while modeling global contextual information. This improves the network's capability to extract feature map information and facilitates effective interaction between global and local information within the transformer, enhancing the representational power of transformers. We conducted a large number of experiments on the LiTS-2017 and ISIC-2018 datasets to verify the effectiveness of our proposed method, with DCIE values of 93.81% and 88.22%, respectively. Through ablation experiments, we proved the effectiveness of the TCI module, and in comparison with other state-of-the-art (SOTA) networks, we demonstrated the superiority of TCI-UNet in accuracy and generalization.