Swin MAE: Masked autoencoders for small datasets

Zi'an Xu; Yin Dai; Fayu Liu; Weibing Chen; Yue Liu; Lifu Shi; Sheng Liu; Yuhang Zhou

doi:10.1016/j.compbiomed.2023.107037

Swin MAE: Masked autoencoders for small datasets

Comput Biol Med. 2023 Jul:161:107037. doi: 10.1016/j.compbiomed.2023.107037. Epub 2023 May 23.

Authors

Zi'an Xu¹, Yin Dai², Fayu Liu³, Weibing Chen¹, Yue Liu¹, Lifu Shi⁴, Sheng Liu³, Yuhang Zhou³

Affiliations

¹ Northeastern University, Shenyang, China.
² Northeastern University, Shenyang, China. Electronic address: daiyin@bmie.neu.edu.cn.
³ China Medical University, Shenyang, China.
⁴ Liaoning Jiayin Medical Technology Co., China.

PMID: 37230020
DOI: 10.1016/j.compbiomed.2023.107037

Abstract

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most unsupervised learning methods must be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images, Swin MAE can still learn useful semantic features purely from images without using any pre-trained models. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in the transfer learning results of downstream tasks. Compared to MAE, Swin MAE brought a performance improvement of twice and five times for downstream tasks on BTCV and our parotid dataset, respectively. The code is publicly available at https://github.com/Zian-Xu/Swin-MAE.

Keywords: MAE; Masked autoencoder; Small dataset; Swin transformer; Unsupervised learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Parotid Gland*
Problem Solving*
Semantics