Robust and Rotation-Equivariant Contrastive Learning

IEEE Trans Neural Netw Learn Syst. 2023 Feb 16:PP. doi: 10.1109/TNNLS.2023.3243258. Online ahead of print.

Abstract

Contrastive learning (CL) methods achieve great success by learning the invariant representation from various transformations. However, rotation transformations are considered harmful to CL and are rarely used, which results in failure when the objects show unseen orientations. This article proposes a representation focus shift network (RefosNet), which adds the rotation transformations to CL methods to improve the robustness of representation. First, the RefosNet constructs the rotation-equivariant mapping between the features of the original image and the rotated ones. Then, the RefosNet learns semantic-invariant representations (SIRs) based on explicitly decoupling the rotation-invariant features and the rotation-equivariant features. Moreover, an adaptive gradient passivation strategy is introduced to gradually shift the representation focus to invariant representations. This strategy can prevent catastrophic forgetting of the rotation equivariance, which is beneficial to the generalization of representations in both seen and unseen orientations. We adapt the baseline methods (i.e.", SimCLR" and "momentum contrast (MoCo) v2") to work with RefosNet to verify the performance. Extensive experimental results show that our method achieves significant improvements on the task of recognition. On ObjectNet-13 with unseen orientations, RefosNet gains 7.12% in terms of classification accuracy compared with SimCLR. On datasets in seen orientation, the performance improves by 5.5% on ImageNet-100, 7.29% on STL10, and 1.93% on CIFAR10. In addition, RefosNet has strong generalization on Place205, PASCAL VOC, and Caltech 101. Our method has also achieved satisfactory results in image retrieval tasks.