The distribution of the C(alpha)-C(alpha) distances between residues separated by three to 30 amino acid residues is highly characteristic of protein folds and makes it possible to identify them from a straightforward comparison of the distance histograms. The comparison is carried out by contingency table analysis and yields a probability of identity (PRIDE score), with values between zero and 1. For closely related structures, PRIDE is highly correlated with the root-mean-square distance between C(alpha) atoms, but it provides a correct classification even for unrelated structures for which a structural alignment is not meaningful. For example, an analysis of the CATH database of fold structures showed that 98.8% of the folds fall into the correct CATH homologous superfamily category, based on the highest PRIDE score obtained. Structural alignment and secondary-structure assignment are not necessary for the calculation of PRIDE, which is fast enough to allow the scanning of large databases.
Copyright 2002 Academic Press.