Unsupervised ranking of clustering algorithms by INFOMAX

Sandipan Sikdar; Animesh Mukherjee; Matteo Marsili

doi:10.1371/journal.pone.0239331

Unsupervised ranking of clustering algorithms by INFOMAX

PLoS One. 2020 Oct 26;15(10):e0239331. doi: 10.1371/journal.pone.0239331. eCollection 2020.

Authors

Sandipan Sikdar¹, Animesh Mukherjee², Matteo Marsili³

Affiliations

¹ RWTH Aachen University, Aachen, Germany.
² Indian Institute of Technology Kharagpur, Kharagpur, India.
³ Abdus Salam International Centre for Theoretical Physics, Trieste, Italy.

Abstract

Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker's Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Cluster Analysis
Databases, Factual
User-Computer Interface*

Grants and funding

SS was supported by Sandwich Training Educational Programme (STEP) and Simons foundation under Simons Visitor programme. AM was supported by Simons foundation under Simons Associateship Programme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.