TransWeaver: Weave Image Pairs for Class Agnostic Common Object Detection

IEEE Trans Image Process. 2023:32:2947-2959. doi: 10.1109/TIP.2023.3275870. Epub 2023 May 26.

Abstract

Measuring the similarity of two images is of crucial importance in computer vision. Class agnostic common object detection is a nascent research topic about mining image similarity, which aims to detect common object pairs from two images without category information. This task is general and less restrictive which explores the similarity between objects and can further describe the commonality of image pairs at the object level. However, previous works suffer from features with low discrimination caused by the lack of category information. Moreover, most existing methods compare objects extracted from two images in a simple and direct way, ignoring the internal relationships between objects in the two images. To overcome these limitations, in this paper, we propose a new framework called TransWeaver, which learns intrinsic relationships between objects. Our TransWeaver takes image pairs as input and flexibly captures the inherent correlation between candidate objects from two images. It consists of two modules (i.e., the representation-encoder and the weave-decoder) and captures efficient context information by weaving image pairs to make them interact with each other. The representation-encoder is used for representation learning, which can obtain more discriminative representations for candidate proposals. Furthermore, the weave-decoder weaves the objects from two images and is able to explore the inter-image and intra-image context information at the same time, bringing a better object matching ability. We reorganize the PASCAL VOC, COCO, and Visual Genome datasets to obtain training and testing image pairs. Extensive experiments demonstrate the effectiveness of the proposed TransWeaver which achieves state-of-the-art performance on all datasets.