A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

IEEE Trans Neural Netw Learn Syst. 2016 Dec;27(12):2499-2512. doi: 10.1109/TNNLS.2015.2490080. Epub 2015 Oct 29.

Abstract

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l2 -norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.