Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment

Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jul;72(1 Pt 1):011906. doi: 10.1103/PhysRevE.72.011906. Epub 2005 Jul 12.

Abstract

Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20 x 20 matrix. We have studied the pairwise contact energy functions of 20 x 20, 60 x 60, and 180 x 180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20 x 20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Biophysics / methods
  • Databases, Protein
  • Escherichia coli / enzymology
  • Models, Statistical
  • Molecular Structure
  • Mutation
  • Neural Networks, Computer
  • Protein Conformation
  • Protein Folding
  • Ribonuclease H / chemistry
  • Software
  • Thermodynamics

Substances

  • Amino Acids
  • Ribonuclease H
  • ribonuclease HI