Motif identification neural design for rapid and sensitive protein family search

Pac Symp Biocomput. 1996:674-85.

Abstract

The accelerated growth of the molecular sequencing data has generated a pressing need for advanced sequence annotation tools. This paper reports a new method, termed MOTIFIND (Motif Identification Neural Design), for rapid and sensitive protein family identification. The method is extended from our previous gene classification artificial neural system and employs two new designs to enhance the detection of distant relationships. These include an n-gram term weighting algorithm for extracting local motif patterns, and integrated neural networks for combining global and local sequence information. The system has been tested with three protein families of electron transferases, namely cytochrome c, cytochrome b and flavodoxin, with a 100% sensitivity and more than 99.6% specificity. The accuracy of MOTIFIND is comparable to the BLAST database search method, but its speed is more than 20 times faster. The system is much more robust than the PROSITE search which is based on simple signature patterns. MOTIFIND also compares favorably with the BLIMPS search of BLOCKS in detecting fragmentary sequences lacking complete motif regions. The method has the potential to become a full-scale database search and sequence analysis tool.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence*
  • Cytochrome b Group / chemistry
  • Cytochrome c Group / chemistry
  • Databases, Factual*
  • Flavodoxin / chemistry
  • Neural Networks, Computer
  • Proteins / chemistry*
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Cytochrome b Group
  • Cytochrome c Group
  • Flavodoxin
  • Proteins