Graph-Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures

Mol Inform. 2013 Feb;32(2):165-78. doi: 10.1002/minf.201200110. Epub 2013 Feb 5.

Abstract

Consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics. In this paper, consensus clustering is used for combining the clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Two graph-based consensus clustering methods were examined. The Quality Partition Index method (QPI) was used to evaluate the clusterings and the results were compared to the Ward's clustering method. Two homogeneous and heterogeneous subsets DS1-DS2 of MDL Drug Data Report database (MDDR) were used for experiments and represented by two 2D fingerprints. The results, obtained by a combination of multiple runs of an individual clustering and a single run of multiple individual clusterings, showed that graph-based consensus clustering methods can improve the effectiveness of chemical structures clusterings.

Keywords: Compound selection; Ensemble generations; Graph partitioning; High throughput Screening; Individual clusterings; Molecular dataset.