Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations

J Mol Biol. 2011 May 6;408(3):585-95. doi: 10.1016/j.jmb.2011.02.056. Epub 2011 Mar 2.

Abstract

Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, >100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods*
  • Models, Molecular
  • Protein Engineering / methods*
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*

Substances

  • Proteins