Projected Affinity Values for Nyström Spectral Clustering

Entropy (Basel). 2018 Jul 10;20(7):519. doi: 10.3390/e20070519.

Abstract

In kernel methods, Nyström approximation is a popular way of calculating out-of-sample extensions and can be further applied to large-scale data clustering and classification tasks. Given a new data point, Nyström employs its empirical affinity vector, k, for calculation. This vector is assumed to be a proper measurement of the similarity between the new point and the training set. In this paper, we suggest replacing the affinity vector by its projections on the leading eigenvectors learned from the training set, i.e., using k*=∑i=1ckTuiui instead, where ui is the i-th eigenvector of the training set and c is the number of eigenvectors used, which is typically equal to the number of classes designed by users. Our work is motivated by the constraints that in kernel space, the kernel-mapped new point should (a) also lie on the unit sphere defined by the Gaussian kernel and (b) generate training set affinity values close to k. These two constraints define a Quadratic Optimization Over a Sphere (QOOS) problem. In this paper, we prove that the projection on the leading eigenvectors, rather than the original affinity vector, is the solution to the QOOS problem. The experimental results show that the proposed replacement of k by k* slightly improves the performance of the Nyström approximation. Compared with other affinity matrix modification methods, our k* obtains comparable or higher clustering performance in terms of accuracy and Normalized Mutual Information (NMI).

Keywords: Nyström approximation; empirical affinity; machine learning; out-of-sample.