Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison

Oliviero Carugo; Sándor Pongor

doi:10.1006/jmbi.2001.5250

Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison

J Mol Biol. 2002 Jan 25;315(4):887-98. doi: 10.1006/jmbi.2001.5250.

Authors

Oliviero Carugo¹, Sándor Pongor

Affiliation

¹ Protein Structure and Function Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, Padriciano 99, Trieste, 34012, Italy. carugo@icgeb.trieste.it

PMID: 11812155
DOI: 10.1006/jmbi.2001.5250

Abstract

The distribution of the C(alpha)-C(alpha) distances between residues separated by three to 30 amino acid residues is highly characteristic of protein folds and makes it possible to identify them from a straightforward comparison of the distance histograms. The comparison is carried out by contingency table analysis and yields a probability of identity (PRIDE score), with values between zero and 1. For closely related structures, PRIDE is highly correlated with the root-mean-square distance between C(alpha) atoms, but it provides a correct classification even for unrelated structures for which a structural alignment is not meaningful. For example, an analysis of the CATH database of fold structures showed that 98.8% of the folds fall into the correct CATH homologous superfamily category, based on the highest PRIDE score obtained. Structural alignment and secondary-structure assignment are not necessary for the calculation of PRIDE, which is fast enough to allow the scanning of large databases.

MeSH terms

Animals
Cluster Analysis
Computational Biology / methods*
Databases, Protein
Evolution, Molecular
Humans
Models, Molecular
Probability*
Protein Folding*
Protein Structure, Secondary
Protein Structure, Tertiary
Proteins / chemistry*
Proteins / classification*
Sequence Homology
Software

Substances

Proteins