Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

PeerJ Comput Sci. 2021 May 25:7:e552. doi: 10.7717/peerj-cs.552. eCollection 2021.

Abstract

Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement ("hard" similarity and "soft" similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.

Keywords: Bidirectional Bi-linear Interaction; Cross-Modal Hashing; Deep Neural Network; Dual-Similarity Measurement.

Grants and funding

This work is supported by the National Natural Science Foundation of China (61806168), Venture & Innovation Support Program for Chongqing Overseas Returnees (CX2018075), and Fundamental Research Funds for the Central Universities (SWU117059). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.