New diversity calculations algorithms used for compound selection

J Chem Inf Comput Sci. 2002 Mar-Apr;42(2):249-58. doi: 10.1021/ci0100649.

Abstract

Some modifications were introduced into the previously described Centroid diversity sorting algorithm, which uses cosine similarity metric. The modified algorithm is suitable for the work with large databases on personal computers. For example, for diversity sorting of the database with the size greater than a million of records, less than 9 h are required (Pentium III, 800 MHz). The problem of selecting new compounds into the existing collection is examined to reach the maximum diversity of the collection. The article describes the new algorithm for the selection of heterocyclic compounds.