Differential Consistency Analysis: Which Similarity Measures can be Applied in Drug Discovery?

Mol Inform. 2021 Jul;40(7):e2060017. doi: 10.1002/minf.202060017. Epub 2021 Apr 23.

Abstract

Similarity measures are widely used in various areas from taxonomy to cheminformatics. To this end, a large number of similarity and distance measures (or, collectively, comparative measures) have been introduced, with only a few studies directed to revealing their inner relationships. We present a thorough analytical study of the conditions leading to two comparative measures providing equivalent results over a given set of molecules. A key part of this work is the introduction of a novel way to study the consistency between comparative measures: the differential consistency analysis (DCA). This tool reveals how the consistency can be established in an analytical way with minimal (or no) assumptions. We found that the consensus between Tanimoto and the Cosine coefficients improved by choosing a reference whose similarity to the rest of the molecules varies less, or by representing the molecules in a way that does not depend strongly on their size (i. e. bit frequency in the chosen fingerprint representation). The presented derivations are just some generic examples; DCA can be applied widely and for all binary similarity coefficients introduced so far, independently from the molecular representations.

Keywords: Tanimoto index; chemoinformatics; differential consistency analysis; drug design; molecular fingerprints; ranking; similarity indices.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cheminformatics
  • Drug Discovery*