Object-Location-Aware Hashing for Multi-Label Image Retrieval via Automatic Mask Learning

IEEE Trans Image Process. 2018 Sep;27(9):4490-4502. doi: 10.1109/TIP.2018.2839522.

Abstract

Learning-based hashing is a leading approach of approximate nearest neighbor search for large-scale image retrieval. In this paper, we develop a deep supervised hashing method for multi-label image retrieval, in which we propose to learn a binary "mask" map that can identify the approximate locations of objects in an image, so that we use this binary "mask" map to obtain length-limited hash codes which mainly focus on an image's objects but ignore the background. The proposed deep architecture consists of four parts: 1) a convolutional sub-network to generate effective image features; 2) a binary "mask" sub-network to identify image objects' approximate locations; 3) a weighted average pooling operation based on the binary "mask" to obtain feature representations and hash codes that pay most attention to foreground objects but ignore the background; and 4) the combination of a triplet ranking loss designed to preserve relative similarities among images and a cross entropy loss defined on image labels. We conduct comprehensive evaluations on four multi-label image data sets. The results indicate that the proposed hashing method achieves superior performance gains over the state-of-the-art supervised or unsupervised hashing baselines.