On predicting foldability of a protein from its sequence

Proteins. 2020 Feb;88(2):355-365. doi: 10.1002/prot.25811. Epub 2019 Oct 3.

Abstract

Several properties of amino acid sequences corresponding to proteins that are known to fold are compared to those of randomly generated sequences and to sequences of intrinsically disordered proteins in order to find properties that distinguish folding sequences from the rest. The properties studied included helix and sheet propensities from secondary structure prediction, adjacency correlations, directionality correlations, as well as propensities of all possible triplets and quadruplets. Small differences between known folded and random sequences were observed for the adjacency and directional correlations, and significant differences were seen on the triplet and especially on the quadruplet propensities. Based on the differences in the adjacency, triplet or quadruplet propensities folding scores were defined and used to test the accuracy of foldability prediction based on these statistics. The best predictions were obtained from the quadruplet propensities.

Keywords: protein foldability; residue correlation; residue propensity; secondary structure prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acids
  • Databases, Protein
  • Intrinsically Disordered Proteins / chemistry*
  • Intrinsically Disordered Proteins / genetics
  • Protein Folding*
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Proteins / genetics
  • Sequence Homology, Amino Acid

Substances

  • Amino Acids
  • Intrinsically Disordered Proteins
  • Proteins