Efficient Combination of CNN and Transformer for Dual-Teacher Uncertainty-guided Semi-supervised Medical Image Segmentation

Comput Methods Programs Biomed. 2022 Nov:226:107099. doi: 10.1016/j.cmpb.2022.107099. Epub 2022 Sep 2.

Abstract

Background and objective: Deep learning-based methods for fast target segmentation of magnetic resonance imaging (MRI) have become increasingly popular in recent years. Generally, the success of deep learning methods in medical image segmentation tasks relies on a large amount of labeled data. The time-consuming and labor-intensive problem of data annotation is a major challenge in medical image segmentation tasks. The aim of this work is to enhance the segmentation of MR images using a semi-supervised learning-based method using a small amount of labeled data and a large amount of unlabeled data.

Methods: To utilize the effective information of the unlabeled data, we designed the method of guiding the Student segmentation model simultaneously by the Dual-Teacher structure of CNN and transformer forming the subject network. Both Teacher A and Student models are CNNs, and the TA-S module they form is a mean teacher structure with added data noise. In the TB-S module formed by the combination of Student and Teacher B models, their backbone networks CNN and transformer capture the local and global information of the image at the same time, respectively, to create pseudo labels for each other and perform cross-supervision. The Dual-Teacher guides the Student through synchronous training and performs knowledge rectification and communication with each other through consistent regular constraints, which better utilizes the valid information in the unlabeled data. In addition, the segmentation predictions of Teacher A and Student and Teacher A and Teacher B are screened for uncertainty assessment during the training process to enhance the prediction accuracy and generalization of the model. This method uses the mechanism of simultaneous training of the synthetic structure composed of TA-S and TB-S modules to jointly guide the optimization of the Student model to obtain better segmentation ability.

Results: We evaluated the proposed method on a publicly available MRI dataset from a cardiac segmentation competition organized by MICCAI in 2017. Compared with several existing state-of-the-art semi-supervised segmentation methods, the method achieves better segmentation results in terms of Dice coefficient and HD distance evaluation metrics of 0.878 and 4.9 mm and 0.886 and 5.0 mm, respectively, using a training set containing only 10% and 20% of labeled data.

Conclusion: This method fuses CNN and transformer to design a new Teacher-Student semi-supervised learning optimization strategy, which greatly improves the utilization of a large number of unlabeled medical images and the effectiveness of model segmentation results.

Keywords: Deep learning; Image segmentation; Magnetic resonance imaging (MRI); Semi-supervised learning; Transformer.

MeSH terms

  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Magnetic Resonance Imaging / methods
  • Neural Networks, Computer*
  • Supervised Machine Learning
  • Uncertainty