Using physical potentials and learned models to distinguish native binding interfaces from de novo designed interfaces that do not bind

Proteins. 2013 Nov;81(11):1919-30. doi: 10.1002/prot.24337. Epub 2013 Aug 19.

Abstract

Protein-protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non-energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support-vector machine kernels. The Gaussian kernel model classified both high-resolution complexes and designed nonbinders with 79-86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric-dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types.

Keywords: machine learning; protein binding; protein complex; protein design; stacking interactions.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Models, Theoretical*
  • Protein Binding
  • Proteins / chemistry*
  • Software

Substances

  • Proteins