Domain-Adversarial-Guided Siamese Network for Unsupervised Cross-Domain 3-D Object Retrieval

An-An Liu; Fu-Bin Guo; He-Yu Zhou; Cheng-Gang Yan; Zan Gao; Xuan-Ya Li; Wen-Hui Li

doi:10.1109/TCYB.2021.3139927

Domain-Adversarial-Guided Siamese Network for Unsupervised Cross-Domain 3-D Object Retrieval

IEEE Trans Cybern. 2022 Dec;52(12):13862-13873. doi: 10.1109/TCYB.2021.3139927. Epub 2022 Nov 18.

Authors

An-An Liu, Fu-Bin Guo, He-Yu Zhou, Cheng-Gang Yan, Zan Gao, Xuan-Ya Li, Wen-Hui Li

PMID: 35077378
DOI: 10.1109/TCYB.2021.3139927

Abstract

Recent advances in 3-D sensors and 3-D modeling have led to the availability of massive amounts of 3-D data. It is too onerous and time consuming to manually label a plentiful of 3-D objects in real applications. In this article, we address this issue by transferring the knowledge from the existing labeled data (e.g., the annotated 2-D images or 3-D objects) to the unlabeled 3-D objects. Specifically, we propose a domain-adversarial guided siamese network (DAGSN) for unsupervised cross-domain 3-D object retrieval (CD3DOR). It is mainly composed of three key modules: 1) siamese network-based visual feature learning; 2) mutual information (MI)-based feature enhancement; and 3) conditional domain classifier-based feature adaptation. First, we design a siamese network to encode both 3-D objects and 2-D images from two domains because of its balanced accuracy and efficiency. Besides, it can guarantee the same transformation applied to both domains, which is crucial for the positive domain shift. The core issue for the retrieval task is to improve the capability of feature abstraction, but the previous CD3DOR approaches merely focus on how to eliminate the domain shift. We solve this problem by maximizing the MI between the input 3-D object or 2-D image data and the high-level feature in the second module. To eliminate the domain shift, we design a conditional domain classifier, which can exploit multiplicative interactions between the features and predictive labels, to enforce the joint alignment in both feature level and category level. Consequently, the network can generate domain-invariant yet discriminative features for both domains, which is essential for CD3DOR. Extensive experiments on two protocols, including the cross-dataset 3-D object retrieval protocol (3-D to 3-D) on PSB/NTU, and the cross-modal 3-D object retrieval protocol (2-D to 3-D) on MI3DOR-2, demonstrate that the proposed DAGSN can significantly outperform state-of-the-art CD3DOR methods.