Cross-modal dual subspace learning with adversarial network

Fei Shang; Huaxiang Zhang; Jiande Sun; Liqiang Nie; Li Liu

doi:10.1016/j.neunet.2020.03.015

Cross-modal dual subspace learning with adversarial network

Neural Netw. 2020 Jun:126:132-142. doi: 10.1016/j.neunet.2020.03.015. Epub 2020 Mar 19.

Authors

Fei Shang¹, Huaxiang Zhang², Jiande Sun³, Liqiang Nie⁴, Li Liu⁵

Affiliations

¹ School of Information Science and Engineering, Shandong Normal University, Jinan 250014, Shandong Province, China.
² School of Information Science and Engineering, Shandong Normal University, Jinan 250014, Shandong Province, China; Institute of Data Science and Technology, Shandong Normal University, Jinan 250014, Shandong Province, China. Electronic address: huaxzhang@hotmail.com.
³ School of Information Science and Engineering, Shandong Normal University, Jinan 250014, Shandong Province, China; Institute of Data Science and Technology, Shandong Normal University, Jinan 250014, Shandong Province, China.
⁴ School of Computer Science and Technology, Shandong University, China.
⁵ School of Information Science and Engineering, Shandong Normal University, Jinan 250014, Shandong Province, China; Institute of Data Science and Technology, Shandong Normal University, Jinan 250014, Shandong Province, China. Electronic address: liuli_790209@163.com.

PMID: 32217354
DOI: 10.1016/j.neunet.2020.03.015

Abstract

Cross-modal retrieval has recently attracted much interest along with the rapid development of multimodal data, and effectively utilizing the complementary relationship of different modal data and eliminating the heterogeneous gap as much as possible are the two key challenges. In this paper, we present a novel network model termed cross-modal Dual Subspace learning with Adversarial Network (DSAN). The main contributions are as follows: (1) Dual subspaces (visual subspace and textual subspace) are proposed, which can better mine the underlying structure information of different modalities as well as modality-specific information. (2) An improved quadruplet loss is proposed, which takes into account the relative distance and absolute distance between positive and negative samples, together with the introduction of the idea of hard sample mining. (3) Intra-modal constrained loss is proposed to maximize the distance of the most similar cross-modal negative samples and their corresponding cross-modal positive samples. In particular, feature preserving and modality classification act as two antagonists. DSAN tries to narrow the heterogeneous gap between different modalities, and distinguish the original modality of random samples in dual subspaces. Comprehensive experimental results demonstrate that, DSAN significantly outperforms 9 state-of-the-art methods on four cross-modal datasets.

Keywords: Adversarial network; Cross-modal retrieval; Subspace learning.

MeSH terms

Game Theory*
Humans
Machine Learning* / trends
Neural Networks, Computer*
Photic Stimulation / methods
Social Media* / trends