Information Clustering Using Manifold-Based Optimization of the Bag-of-Features Representation

IEEE Trans Cybern. 2018 Jan;48(1):52-63. doi: 10.1109/TCYB.2016.2623581. Epub 2016 Nov 10.

Abstract

In this paper, a manifold-based dictionary learning method for the bag-of-features (BoF) representation optimized toward information clustering is proposed. First, the spectral representation, which unwraps the manifolds of the data and provides better clustering solutions, is formed. Then, a new dictionary is learned in order to make the histogram space, i.e., the space where the BoF historgrams exist, as similar as possible to the spectral space. The ability of the proposed method to improve the clustering solutions is demonstrated using a wide range of datasets: two image datasets, the 15-scene dataset and the Corel image dataset, one video dataset, the KTH dataset, and one text dataset, the RT-2k dataset. The proposed method improves both the internal and the external clustering criteria for two different clustering algorithms: 1) the -means and 2) the spectral clustering. Also, the optimized histogram space can be used to directly assign a new object to its cluster, instead of using the spectral space (which requires reapplying the spectral clustering algorithm or using incremental spectral clustering techniques). Finally, the learned representation is also evaluated using an information retrieval setup and it is demonstrated that improves the retrieval precision over the baseline BoF representation.