Tree-structured algorithm for long weak motif discovery

Bioinformatics. 2011 Oct 1;27(19):2641-7. doi: 10.1093/bioinformatics/btr459. Epub 2011 Aug 5.

Abstract

Motivation: Motifs in DNA sequences often appear in degenerate form, so there has been an increased interest in computational algorithms for weak motif discovery. Probabilistic algorithms are unable to detect weak motifs while exact methods have been able to detect only short weak motifs. This article proposes an exact tree-based motif detection (TreeMotif) algorithm capable of discovering longer and weaker motifs than by the existing methods.

Results: TreeMotif converts the graphical representation of motifs into a tree-structured representation in which a tree that branches with nodes from every sequence represents motif instances. The method of tree construction is novel to motif discovery based on graphical representation. TreeMotif is more efficient and scalable in handling longer and weaker motifs than the existing algorithms in terms of accuracy and execution time. The performances of TreeMotif were demonstrated on synthetic data as well as on real biological data.

Availability: https://sites.google.com/site/shqssw/treemotif

Contact: sunh0013@e.ntu.edu.sg

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics*
  • Base Sequence
  • Gene Expression Regulation / genetics*
  • Models, Genetic
  • Transcription Factors / genetics*

Substances

  • Transcription Factors