Clustering biomolecular complexes by residue contacts similarity

João P G L M Rodrigues; Mikaël Trellet; Christophe Schmitz; Panagiotis Kastritis; Ezgi Karaca; Adrien S J Melquiond; Alexandre M J J Bonvin

doi:10.1002/prot.24078

Clustering biomolecular complexes by residue contacts similarity

Proteins. 2012 Jul;80(7):1810-7. doi: 10.1002/prot.24078. Epub 2012 May 8.

Authors

João P G L M Rodrigues¹, Mikaël Trellet, Christophe Schmitz, Panagiotis Kastritis, Ezgi Karaca, Adrien S J Melquiond, Alexandre M J J Bonvin

Affiliation

¹ Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, 3584 CH Utrecht, The Netherlands.

PMID: 22489062
DOI: 10.1002/prot.24078

Abstract

Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cluster Analysis*
DNA / chemistry*
DNA / metabolism
Models, Chemical*
Models, Molecular
Multiprotein Complexes / chemistry*
Multiprotein Complexes / metabolism
Protein Binding
Proteins / chemistry
Proteins / metabolism

Substances

Multiprotein Complexes
Proteins
DNA