A tabular approach to the sequence-to-structure relation in proteins (tetrapeptide representation) for de novo protein design

Med Sci Monit. 2006 Jun;12(6):BR208-14. Epub 2006 May 29.

Abstract

Background: Experimental observations classify the protein-folding process as a multi-step event. The backbone conformation has been experimentally recognized as responsible for the early-stage structural forms of a polypeptide. The sequence-to-structure and structure-to-sequence relation is critical for predicting protein structure. A contingency table representing this relation for tetrapeptides in their early-stage is presented. Their correlation seems to be essential in protein-folding simulation.

Material/methods: The polypeptide chains of all the proteins in the Protein Data Bank were transformed into their early-stage structural forms. The tetrapeptide was selected as the structural unit. Tetrapetide sequences and structures were expressed by letter codes. The transformation of a contingency table of any size (here: 160,000x2401) to a 2x2 table performed for each non-zero cell of the original table allowed calculation of the rho-coefficient measuring the strength of the relation.

Results: High values of the rho-coefficient extracted sequences of strong structural determinability and structures of high sequence selectivity. The web-site program to calculate the rho-coefficient ranking list was constructed to enable applying this method to any problem of contingency table analysis.

Conclusions: The results revealed sequence-to-structure (and vice versa) correlation in early-stage folding. Surprisingly, the irregular structural forms of loops and bends appeared to be highly determined. Comparison of these results with another method based on information entropy revealed high accordance. The method oriented on interpretation of a large contingency table seems very useful especially for large-scale microarray analysis, a very popular technique in the post-genomic era.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Databases, Protein*
  • Molecular Sequence Data
  • Peptides / chemistry*
  • Protein Conformation
  • Protein Folding
  • Proteins / chemistry*

Substances

  • Peptides
  • Proteins