Fully Convolutional Network-Based Self-Supervised Learning for Semantic Segmentation

IEEE Trans Neural Netw Learn Syst. 2022 May 11:PP. doi: 10.1109/TNNLS.2022.3172423. Online ahead of print.

Abstract

Although deep learning has achieved great success in many computer vision tasks, its performance relies on the availability of large datasets with densely annotated samples. Such datasets are difficult and expensive to obtain. In this article, we focus on the problem of learning representation from unlabeled data for semantic segmentation. Inspired by two patch-based methods, we develop a novel self-supervised learning framework by formulating the jigsaw puzzle problem as a patch-wise classification problem and solving it with a fully convolutional network. By learning to solve a jigsaw puzzle comprising 25 patches and transferring the learned features to semantic segmentation task, we achieve a 5.8% point improvement on the Cityscapes dataset over the baseline model initialized from random values. It is noted that we use only about 1/6 training images of Cityscapes in our experiment, which is designed to imitate the real cases where fully annotated images are usually limited to a small number. We also show that our self-supervised learning method can be applied to different datasets and models. In particular, we achieved competitive performance with the state-of-the-art methods on the PASCAL VOC2012 dataset using significantly fewer time costs on pretraining.