Swin MoCo: Improving parotid gland MRI segmentation using contrastive learning

Med Phys. 2024 May 15. doi: 10.1002/mp.17128. Online ahead of print.

Abstract

Background: Segmentation of the parotid glands and tumors by MR images is essential for treating parotid gland tumors. However, segmentation of the parotid glands is particularly challenging due to their variable shape and low contrast with surrounding structures.

Purpose: The lack of large and well-annotated datasets limits the development of deep learning in medical images. As an unsupervised learning method, contrastive learning has seen rapid development in recent years. It can better use unlabeled images and is hopeful to improve parotid gland segmentation.

Methods: We propose Swin MoCo, a momentum contrastive learning network with Swin Transformer as its backbone. The ImageNet supervised model is used as the initial weights of Swin MoCo, thus improving the training effects on small medical image datasets.

Results: Swin MoCo trained with transfer learning improves parotid gland segmentation to 89.78% DSC, 85.18% mIoU, 3.60 HD, and 90.08% mAcc. On the Synapse multi-organ computed tomography (CT) dataset, using Swin MoCo as the pre-trained model of Swin-Unet yields 79.66% DSC and 12.73 HD, which outperforms the best result of Swin-Unet on the Synapse dataset.

Conclusions: The above improvements require only 4 h of training on a single NVIDIA Tesla V100, which is computationally cheap. Swin MoCo provides new approaches to improve the performance of tasks on small datasets. The code is publicly available at https://github.com/Zian-Xu/Swin-MoCo.

Keywords: contrastive learning; image segmentation; parotid gland tumor; transformer; unsupervised learning.