Mining the Protein Data Bank to improve prediction of changes in protein-protein binding

PLoS One. 2021 Nov 2;16(11):e0257614. doi: 10.1371/journal.pone.0257614. eCollection 2021.

Abstract

Predicting the effect of mutations on protein-protein interactions is important for relating structure to function, as well as for in silico affinity maturation. The effect of mutations on protein-protein binding energy (ΔΔG) can be predicted by a variety of atomic simulation methods involving full or limited flexibility, and explicit or implicit solvent. Methods which consider only limited flexibility are naturally more economical, and many of them are quite accurate, however results are dependent on the atomic coordinate set used. In this work we perform a sequence and structure based search of the Protein Data Bank to find additional coordinate sets and repeat the calculation on each. The method increases precision and Positive Predictive Value, and decreases Root Mean Square Error, compared to using single structures. Given the ongoing growth of near-redundant structures in the Protein Data Bank, our method will only increase in applicability and accuracy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Data Mining*
  • Databases, Protein*
  • Predictive Value of Tests
  • Protein Binding
  • ROC Curve
  • Sequence Homology, Amino Acid
  • Structural Homology, Protein
  • Thermodynamics

Grants and funding

We gratefully acknowledge funding from the Swedish Research Council, the Swedish Foundation for International Cooperation in Research and Higher Education and Lars Hierta Memorial Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.