Complex Relation Embedding for Scene Graph Generation

IEEE Trans Neural Netw Learn Syst. 2022 Dec 15:PP. doi: 10.1109/TNNLS.2022.3226871. Online ahead of print.

Abstract

Given an input image, scene graph generation (SGG) aims to generate comprehensive visual relationships between objects in the form of graphs. Recently, more attention to the design of complex networks and complicated strategies has been paid to the long tail issue caused by the imbalanced class distribution. However, most existing methods adopt the concatenated features of two objects in real space as the final relation representation for a given triplet. We mainly argue that such a simple concatenation may neglect the importance of complex interactions between objects, which results in the diversity of visual relations. In addition, the representation learning in real space is also inadequate to express this property. To alleviate these issues, we seamlessly incorporate Hermitian inner product into existing models to facilitate the generation of scene graphs by learning Relation Embedding in Complex space (CoRE). More specifically, we first introduce the concept of complex-valued representations for entities and then formulate the relation triplets with Hermitian inner product in complex space. Finally, we investigate the effect of utilizing only real component or both of Hermitian inner product on inferring more reasonable interaction between objects for scene graphs. Comprehensive experiments on two widely used benchmark datasets, Visual Genome (VG) and Open Image, demonstrate our effectiveness, superiority, and generalization on various metrics for biased or unbiased inference.