Improved mutual information measure for clustering, classification, and community detection

Phys Rev E. 2020 Apr;101(4-1):042304. doi: 10.1103/PhysRevE.101.042304.

Abstract

The information theoretic measure known as mutual information is widely used as a way to quantify the similarity of two different labelings or divisions of the same set of objects, such as arises, for instance, in clustering and classification problems in machine learning or community detection problems in network science. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We derive an expression for this missing term and hence write a corrected mutual information that gives accurate results even in cases where the standard measure fails. We discuss practical implementation of the new measure and give example applications.