Optimizing multiple spaced seeds for homology search

J Comput Biol. 2006 Sep;13(7):1355-68. doi: 10.1089/cmb.2006.13.1355.

Abstract

Optimized spaced seeds improve sensitivity and specificity in local homology search. Several authors have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming (LP)-based algorithm to optimize a set of seeds. Theoretically, our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can be achieved, in most reasonable models of homologous sequences. In practice, our algorithm generates a solution which is at least 90% of the optimal. Our method not only achieves performance better than or comparable to that of a greedy algorithm, but also gives this area a mathematical foundation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Genome / genetics
  • Humans
  • Markov Chains
  • Sequence Homology*