Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison

J Mol Biol. 2002 Jan 25;315(4):887-98. doi: 10.1006/jmbi.2001.5250.

Abstract

The distribution of the C(alpha)-C(alpha) distances between residues separated by three to 30 amino acid residues is highly characteristic of protein folds and makes it possible to identify them from a straightforward comparison of the distance histograms. The comparison is carried out by contingency table analysis and yields a probability of identity (PRIDE score), with values between zero and 1. For closely related structures, PRIDE is highly correlated with the root-mean-square distance between C(alpha) atoms, but it provides a correct classification even for unrelated structures for which a structural alignment is not meaningful. For example, an analysis of the CATH database of fold structures showed that 98.8% of the folds fall into the correct CATH homologous superfamily category, based on the highest PRIDE score obtained. Structural alignment and secondary-structure assignment are not necessary for the calculation of PRIDE, which is fast enough to allow the scanning of large databases.

MeSH terms

  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Evolution, Molecular
  • Humans
  • Models, Molecular
  • Probability*
  • Protein Folding*
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / classification*
  • Sequence Homology
  • Software

Substances

  • Proteins