Clustering biomolecular complexes by residue contacts similarity

Proteins. 2012 Jul;80(7):1810-7. doi: 10.1002/prot.24078. Epub 2012 May 8.

Abstract

Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis*
  • DNA / chemistry*
  • DNA / metabolism
  • Models, Chemical*
  • Models, Molecular
  • Multiprotein Complexes / chemistry*
  • Multiprotein Complexes / metabolism
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism

Substances

  • Multiprotein Complexes
  • Proteins
  • DNA