Transforming Complex Problems Into K-Means Solutions

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):9149-9168. doi: 10.1109/TPAMI.2023.3237667. Epub 2023 Jun 5.

Abstract

K-means is a fundamental clustering algorithm widely used in both academic and industrial applications. Its popularity can be attributed to its simplicity and efficiency. Studies show the equivalence of K-means to principal component analysis, non-negative matrix factorization, and spectral clustering. However, these studies focus on standard K-means with squared euclidean distance. In this review paper, we unify the available approaches in generalizing K-means to solve challenging and complex problems. We show that these generalizations can be seen from four aspects: data representation, distance measure, label assignment, and centroid updating. As concrete applications of transforming problems into modified K-means formulation, we review the following applications: iterative subspace projection and clustering, consensus clustering, constrained clustering, domain adaptation, and outlier detection.