Sequence comparison by sequence harmony identifies subtype-specific functional sites

Nucleic Acids Res. 2006;34(22):6540-8. doi: 10.1093/nar/gkl901. Epub 2006 Nov 27.

Abstract

Multiple sequence alignments are often used to reveal functionally important residues within a protein family. They can be particularly useful for the identification of key residues that determine functional differences between protein subfamilies. We present a new entropy-based method, Sequence Harmony (SH) that accurately detects subfamily-specific positions from a multiple sequence alignment. The SH algorithm implements a novel formula, able to score compositional differences between subfamilies, without imposing conservation, in a simple manner on an intuitive scale. We compare our method with the most important published methods, i.e. AMAS, TreeDet and SDP-pred, using three well-studied protein families: the receptor-binding domain (MH2) of the Smad family of transcription factors, the Ras-superfamily of small GTPases and the MIP-family of integral membrane transporters. We demonstrate that SH accurately selects known functional sites with higher coverage than the other methods for these test-cases. This shows that compositional differences between protein subfamilies provide sufficient basis for identification of functional sites. In addition, SH selects a number of sites of unknown function that could be interesting candidates for further experimental investigation.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Entropy
  • Membrane Transport Proteins / chemistry
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein*
  • Smad Proteins / chemistry
  • ras Proteins / chemistry

Substances

  • Membrane Transport Proteins
  • Smad Proteins
  • ras Proteins