Learning an Invariant and Equivariant Network for Weakly Supervised Object Detection

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):11977-11992. doi: 10.1109/TPAMI.2023.3275142. Epub 2023 Sep 5.

Abstract

Weakly Supervised Object Detection (WSOD) is of increasing importance in the community of computer vision as its extensive applications and low manual cost. Most of the advanced WSOD approaches build upon an indefinite and quality-agnostic framework, leading to unstable and incomplete object detectors. This paper attributes these issues to the process of inconsistent learning for object variations and the unawareness of localization quality and constructs a novel end-to-end Invariant and Equivariant Network (IENet). It is implemented with a flexible multi-branch online refinement, to be naturally more comprehensive-perceptive against various objects. Specifically, IENet first performs label propagation from the predicted instances to their transformed ones in a progressive manner, achieving affine-invariant learning. Meanwhile, IENet also naturally utilizes rotation-equivariant learning as a pretext task and derives an instance-level rotation-equivariant branch to be aware of the localization quality. With affine-invariance learning and rotation-equivariant learning, IENet urges consistent and holistic feature learning for WSOD without additional annotations. On the challenging datasets of both natural scenes and aerial scenes, we substantially boost WSOD to new state-of-the-art performance. The codes have been released at: https://github.com/XiaoxFeng/IENet.