RepCo: Replenish sample views with better consistency for contrastive learning

Neural Netw. 2023 Nov:168:171-179. doi: 10.1016/j.neunet.2023.09.004. Epub 2023 Sep 11.

Abstract

Contrastive learning methods aim to learn shared representations by minimizing distances between positive pairs, and maximizing distances between negative pairs in the embedding space. To achieve better performance of contrastive learning, one of the key problems is to design appropriate sample pairs. In most previous works, random cropping on the input image is utilized to obtain two views as positive pairs. However, such strategies lead to suboptimal performance since the sampled crops may have inconsistent semantic information, which consequently degrades the quality of contrastive views. To address this limitation, we explore to replenish sample views with better consistency of the image and propose a novel self-supervised learning (SSL) framework RepCo. Instead of searching for semantically consistent patches between two different views, we select patches on the same image as the replenishment of positive/negative pairs, encourage patches that are similar but come from different positions as positive pairs, and force patches that are dissimilar but come from adjacent positions to have different representations, i.e. construct negative pairs to enrich the learned representations. Our method effectively generates high-quality contrastive views, explores the untapped semantic consistency on images, and provides more informative representations for downstream tasks. Experiments on adequate downstream tasks have shown that, our approach achieves +2.1 AP50 (COCO pre-trained) and +1.6 AP50 (ImageNet pre-trained) gains on Pascal VOC object detection, +2.3 mIoU gains on Cityscapes semantic segmentation, respectively.

Keywords: Contrastive learning; Sampling strategy; Self-supervised pretraining.

MeSH terms

  • Learning*
  • Neural Networks, Computer*
  • Semantics*