Protein fragment clustering and canonical local shapes

Proteins. 2003 Mar 1;50(4):580-8. doi: 10.1002/prot.10309.

Abstract

A novel clustering method is used to cluster protein fragments by shape. The centroids (mean fragments from each cluster) form a basis set of structural motifs. A database of 156,643 seven-residue fragments is used, and eight different basis sets with varying levels of resolution are generated. Coarse basis sets contain tens of centroids and provide meaningful local shapes, which are more detailed than the traditional secondary structure categories. High-resolution basis sets contain thousands of centroids and can be used to model tertiary structure of longer segments. The basis sets generated fit nontraining set proteins with the expected accuracy.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Cluster Analysis*
  • Databases, Protein
  • Models, Molecular*
  • Molecular Structure
  • Peptide Fragments / chemistry*
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Reproducibility of Results
  • Structural Homology, Protein

Substances

  • Peptide Fragments
  • Proteins