Discrete Semantic Alignment Hashing for Cross-Media Retrieval

IEEE Trans Cybern. 2020 Dec;50(12):4896-4907. doi: 10.1109/TCYB.2019.2912644. Epub 2020 Dec 3.

Abstract

Cross-media hashing, which maps data from different modalities to a low-dimensional sharing Hamming space, has attracted considerable attention due to the rapid increase of multimodal data, for example, images and texts. Recent cross-media hashing works mainly aim at learning compact hash codes to preserve the class label-based or feature-based similarities among samples. However, these methods ignore the unbalanced semantic gaps between different modalities and high-level semantic concepts, which generally results in less effective hash functions and unsatisfying retrieval performance. Specifically, the key words of texts contain semantic meanings, while the low-level features of images lack of semantic meanings. That means the semantic gap in image modality is larger than that in text modality. In this paper, we propose a simple yet effective hashing method for cross-media retrieval to address this problem, dubbed discrete semantic alignment hashing (DSAH). First, DSAH formulates to exploit collaborative filtering to mine the relations between class labels and hash codes, which can reduce memory consumption and computational cost compared to pairwise similarity. Then, the attribute of image modality is employed to align the semantic information with text modality. Finally, to further improve the quality of hash codes, we propose a discrete optimization algorithm to learn discrete hash codes directly, and each bit has a closed-form solution. Extensive experiments on multiple public databases show that our model can seamlessly incorporate attributes and achieve promising performance.