An Atomistic Statistically Effective Energy Function for Computational Protein Design

J Chem Theory Comput. 2016 Aug 9;12(8):4146-68. doi: 10.1021/acs.jctc.6b00090. Epub 2016 Jul 20.

Abstract

Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical potentials.

MeSH terms

  • Bacteriophage T4 / enzymology
  • Databases, Protein
  • Humans
  • Ligands
  • Muramidase / chemistry*
  • Muramidase / genetics
  • Muramidase / metabolism
  • Mutagenesis
  • Polymorphism, Single Nucleotide
  • Protein Conformation
  • Protein Folding
  • Protein Unfolding
  • Superoxide Dismutase-1 / chemistry
  • Superoxide Dismutase-1 / genetics
  • Superoxide Dismutase-1 / metabolism
  • Thermodynamics
  • Viral Proteins / chemistry*
  • Viral Proteins / genetics
  • Viral Proteins / metabolism

Substances

  • Ligands
  • Viral Proteins
  • Superoxide Dismutase-1
  • Muramidase