Computational basis of knowledge-based conformational probabilities derived from local- and long-range interactions in proteins

Proteins. 2007 Jan 1;66(1):29-40. doi: 10.1002/prot.21206.

Abstract

The probabilities of the various basins in Ramachandran maps are examined critically. The theoretical basis of probability calculations both from molecular computations and from protein libraries are discussed. The well-defined basins of the Ramachandran maps are treated as rotational isomeric states. Statistical independence and dependence of the states of different residues along the peptide chain are discussed. The Flory isolated pair hypothesis, near neighbor correlations, context effects, and long-range correlations are examined critically. A method of evaluating long-range correlations in helical and extended sequences is introduced in analogy with earlier polymer theory. Three different protein libraries are constructed where data is considered from residues in the (i) coiled regions, (ii) all regions, and (iii) only the helical and extended regions of proteins. Singlet and pairwise dependent probabilities calculated from these libraries are used to predict whether a given sequence is helical or extended. Predictions using pairwise dependence were not better than those using singlet probabilities. Modeling of long-range correlations improved the predictions significantly. Removal of the Chameleon sequences from the data set also improved the predictions, but to a lesser extent.

Publication types

  • Comparative Study
  • Evaluation Study

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Amino Acids / metabolism
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Protein
  • Knowledge Bases
  • Models, Statistical*
  • Molecular Sequence Data
  • Probability
  • Protein Conformation*
  • Protein Folding
  • Proteins / chemistry
  • Proteins / metabolism

Substances

  • Amino Acids
  • Proteins