Task-dependent visual-codebook compression

Rongrong Ji; Hongxun Yao; Wei Liu; Xiaoshuai Sun; Qi Tian

doi:10.1109/TIP.2011.2176950

Task-dependent visual-codebook compression

IEEE Trans Image Process. 2012 Apr;21(4):2282-93. doi: 10.1109/TIP.2011.2176950. Epub 2011 Nov 22.

Authors

Rongrong Ji¹, Hongxun Yao, Wei Liu, Xiaoshuai Sun, Qi Tian

Affiliation

¹ Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China.

PMID: 22128004
DOI: 10.1109/TIP.2011.2176950

Abstract

A visual codebook serves as a fundamental component in many state-of-the-art computer vision systems. Most existing codebooks are built based on quantizing local feature descriptors extracted from training images. Subsequently, each image is represented as a high-dimensional bag-of-words histogram. Such highly redundant image description lacks efficiency in both storage and retrieval, in which only a few bins are nonzero and distributed sparsely. Furthermore, most existing codebooks are built based solely on the visual statistics of local descriptors, without considering the supervise labels coming from the subsequent recognition or classification tasks. In this paper, we propose a task-dependent codebook compression framework to handle the above two problems. First, we propose to learn a compression function to map an originally high-dimensional codebook into a compact codebook while maintaining its visual discriminability. This is achieved by a codeword sparse coding scheme with Lasso regression, which minimizes the descriptor distortions of training images after codebook compression. Second, we propose to adapt our codebook compression to the subsequent recognition or classification tasks. This is achieved by introducing a label constraint kernel (LCK) into our compression loss function. In particular, our LCK can model heterogeneous kinds of supervision, i.e., (partial) category labels, correlative semantic annotations, and image query logs. We validated our codebook compression in three computer vision tasks: 1) object recognition in PASCAL Visual Object Class 07; 2) near-duplicate image retrieval in UKBench; and 3) web image search in a collection of 0.5 million Flickr photographs. Our compressed codebook has shown superior performances over several state-of-the-art supervised and unsupervised codebooks.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Artificial Intelligence
Data Compression / methods*
Image Enhancement / methods
Image Interpretation, Computer-Assisted / methods*
Pattern Recognition, Automated / methods*
Radiology Information Systems*
Subtraction Technique*