Cross-modal dual subspace learning with adversarial network

Neural Netw. 2020 Jun:126:132-142. doi: 10.1016/j.neunet.2020.03.015. Epub 2020 Mar 19.

Abstract

Cross-modal retrieval has recently attracted much interest along with the rapid development of multimodal data, and effectively utilizing the complementary relationship of different modal data and eliminating the heterogeneous gap as much as possible are the two key challenges. In this paper, we present a novel network model termed cross-modal Dual Subspace learning with Adversarial Network (DSAN). The main contributions are as follows: (1) Dual subspaces (visual subspace and textual subspace) are proposed, which can better mine the underlying structure information of different modalities as well as modality-specific information. (2) An improved quadruplet loss is proposed, which takes into account the relative distance and absolute distance between positive and negative samples, together with the introduction of the idea of hard sample mining. (3) Intra-modal constrained loss is proposed to maximize the distance of the most similar cross-modal negative samples and their corresponding cross-modal positive samples. In particular, feature preserving and modality classification act as two antagonists. DSAN tries to narrow the heterogeneous gap between different modalities, and distinguish the original modality of random samples in dual subspaces. Comprehensive experimental results demonstrate that, DSAN significantly outperforms 9 state-of-the-art methods on four cross-modal datasets.

Keywords: Adversarial network; Cross-modal retrieval; Subspace learning.

MeSH terms

  • Game Theory*
  • Humans
  • Machine Learning* / trends
  • Neural Networks, Computer*
  • Photic Stimulation / methods
  • Social Media* / trends