Parts2Whole: Self-supervised Contrastive Learning via Reconstruction

Domain Adapt Represent Transf Distrib Collab Learn (2020). 2020 Oct:12444:85-95. doi: 10.1007/978-3-030-60548-3_9. Epub 2020 Sep 26.

Abstract

Contrastive representation learning is the state of the art in computer vision, but requires huge mini-batch sizes, special network design, or memory banks, making it unappealing for 3D medical imaging, while in 3D medical imaging, reconstruction-based self-supervised learning reaches a new height in performance, but lacks mechanisms to learn contrastive representation; therefore, this paper proposes a new framework for self-supervised contrastive learning via reconstruction, called Parts2Whole, because it exploits the universal and intrinsic part-whole relationship to learn contrastive representation without using contrastive loss: Reconstructing an image (whole) from its own parts compels the model to learn similar latent features for all its own parts, while reconstructing different images (wholes) from their respective parts forces the model to simultaneously push those parts belonging to different wholes farther apart from each other in the latent space; thereby the trained model is capable of distinguishing images. We have evaluated our Parts2Whole on five distinct imaging tasks covering both classification and segmentation, and compared it with four competing publicly available 3D pretrained models, showing that Parts2Whole significantly outperforms in two out of five tasks while achieves competitive performance on the rest three. This superior performance is attributable to the contrastive representations learned with Parts2Whole. Codes and pretrained models are available at github.com/JLiangLab/Parts2Whole.

Keywords: 3D Self-supervised Learning; Contrastive Representation Learning; Transfer Learning.