Computing exact P-values for DNA motifs

Bioinformatics. 2007 Mar 1;23(5):531-7. doi: 10.1093/bioinformatics/btl662. Epub 2007 Jan 18.

Abstract

Motivation: Many heuristic algorithms have been designed to approximate P-values of DNA motifs described by position weight matrices, for evaluating their statistical significance. They often significantly deviate from the true P-value by orders of magnitude. Exact P-value computation is needed for ranking the motifs. Furthermore, surprisingly, the complexity of the problem is unknown.

Results: We show the problem to be NP-hard, and present MotifRank, software based on dynamic programming, to calculate exact P-values of motifs. We define the exact P-value on a general and more precise model. Asymptotically, MotifRank is faster than the best exact P-value computing algorithm, and is in fact practical. Our experiments clearly demonstrate that MotifRank significantly improves the accuracy of existing approximation algorithms.

Availability: MotifRank is available from http://bio.dlg.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Binding Sites
  • Computer Simulation
  • Models, Statistical
  • Promoter Regions, Genetic*
  • Sequence Analysis, DNA / methods*
  • Software*
  • Transcription Factors / metabolism

Substances

  • Transcription Factors