RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W639-44. doi: 10.1093/nar/gkm275. Epub 2007 May 8.

Abstract

RE-MuSiC is a web-based multiple sequence alignment tool that can incorporate biological knowledge about structure, function, or conserved patterns regarding the sequences of interest. It accepts amino acid or nucleic acid sequences and a set of constraints as inputs. The constraints are pattern descriptions, instead of exact positions of fragments to be aligned together. The output is an alignment where for each pattern (constraint), an occurrence on each sequence can be found aligned together with those on the other sequences, in a manner that the overall alignment is optimized. Its predecessor, MuSiC, has been found useful by researchers since its release in 2004. However, it is noticed in applications that the pattern formulation adopted in MuSiC, namely, plain strings allowing mismatches, is not expressive and flexible enough. The constraint formulation adopted in RE-MuSiC is therefore enhanced to be regular expressions, which is convenient in expressing many biologically significant patterns like those collected in the PROSITE database, or structural consensuses that often involve variable ranges between conserved parts. Experiments demonstrate that RE-MuSiC can be used to help predict important residues and locate phylogenetically conserved structural elements. RE-MuSiC is available on-line at http://140.113.239.131/RE-MUSIC.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Conserved Sequence
  • Humans
  • Information Storage and Retrieval / methods*
  • Internet
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Alignment / standards
  • Sequence Alignment / statistics & numerical data*
  • Sequence Analysis / methods*
  • Sequence Homology, Amino Acid
  • Software*
  • User-Computer Interface*

Substances

  • Proteins