Fast and accurate database homology search using upper bounds of local alignment scores

Bioinformatics. 2005 Apr 1;21(7):912-21. doi: 10.1093/bioinformatics/bti076. Epub 2004 Oct 27.

Abstract

Motivation: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required.

Results: We present a novel method for fast and accurate homology detection, assuming that the Smith-Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Database Management Systems
  • Databases, Protein*
  • Information Storage and Retrieval / methods*
  • Molecular Sequence Data
  • Proteins / analysis
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid*
  • Software

Substances

  • Proteins