Correlated substitution analysis and the prediction of amino acid structural contacts

Brief Bioinform. 2008 Jan;9(1):46-56. doi: 10.1093/bib/bbm052. Epub 2007 Nov 13.

Abstract

It has long been suspected that analysis of correlated amino acid substitutions should uncover pairs or clusters of sites that are spatially proximal in mature protein structures. Accordingly, methods based on different mathematical principles such as information theory, correlation coefficients and maximum likelihood have been developed to identify co-evolving amino acids from multiple sequence alignments. Sets of pairs of sites whose behaviour is identified by these methods as correlated are often significantly enriched in pairs of spatially proximal residues. However, relatively high levels of false-positive predictions typically render such methods, in isolation, of little use in the ab initio prediction of protein structure. Misleading signal (or problems with the estimation of significance levels) can be caused by phylogenetic correlations between homologous sequences and from correlation due to factors other than spatial proximity (for example, correlation of sites which are not spatially close but which are involved in common functional properties of the protein). In recent years, several workers have suggested that information from correlated substitutions should be combined with other sources of information (secondary structure, solvent accessibility, evolutionary rates) in an attempt to reduce the proportion of false-positive predictions. We review methods for the detection of correlated amino acid substitutions, compare their relative performance in contact prediction and predict future directions in the field.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Amino Acids / chemistry*
  • Binding Sites
  • Computer Simulation
  • Models, Chemical*
  • Models, Molecular*
  • Molecular Sequence Data
  • Protein Binding
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*

Substances

  • Amino Acids
  • Proteins