The SiteSeeker motif discovery tool

In Silico Biol. 2009;9(1-2):11-22.

Abstract

In this paper we describe some utilizing conditions of a recently published tool that offers two basic functions for the classical problem of discovering motifs in a set of promoter sequences. For the first it is assumed that not necessarily all of the sequences possess a common motif of given length l. In this case, CHECKPROMOTER allows an exact identification of maximal subsets of related promoters. The purpose of this program is to recognize putatively co-regulated genes. The second, CHECKMOTIF, solves the problem of checking if the given promoters have a common motif. It uses a fast approximation algorithm for which we were able to derive non-trivial low performance bounds (defined as the ratio of Hamming distance of the obtained solution to that of a theoretically best solution) for the computed outputs. Both programs use a novel weighted Hamming distance paradigm for evaluating the similarity of sets of l-mers, and we are able to compute performance bounds for the proposed motifs. A set of At promoters were used as a benchmark for a comparative test against five known tools. It could be verified that SiteSeeker significantly outperformed these tools.

MeSH terms

  • Algorithms
  • Arabidopsis / genetics*
  • Computational Biology
  • Gene Expression Regulation, Plant*
  • Promoter Regions, Genetic / genetics*
  • Regulatory Sequences, Nucleic Acid*