Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method

Soroosh Shalileh; Boris Mirkin

doi:10.3390/e24050626

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method

Entropy (Basel). 2022 Apr 29;24(5):626. doi: 10.3390/e24050626.

Authors

Soroosh Shalileh¹, Boris Mirkin^{2

3}

Affiliations

¹ Center for Language and Brain, HSE University, Myasnitskaya Ulitsa 20, 101000 Moscow, Russia.
² Department of Data Analysis and Artificial Intelligence, HSE University, Pokrovsky Boulevard, 11, 101000 Moscow, Russia.
³ Department of Computer Science and Information Systems, Birkbeck University of London, Malet Street, London WC1E 7HX, UK.

Abstract

This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets.

Keywords: K-means clustering; cluster analysis; community detection; data recovery; feature-rich networks; node-attributed networks; nonsummability assumption.

Grants and funding

14.641.31.0004/Russian Government