SiTaR: a novel tool for transcription factor binding site prediction

Bioinformatics. 2011 Oct 15;27(20):2806-11. doi: 10.1093/bioinformatics/btr492. Epub 2011 Sep 4.

Abstract

Motivation: Prediction of transcription factor binding sites (TFBSs) is crucial for promoter modeling and network inference. Quality of the predictions is spoiled by numerous false positives, which persist as the main problem for all presently available TFBS search methods.

Results: We suggest a novel approach, which is alternative to widely used position weight matrices (PWMs) and Hidden Markov Models. Each motif of the input set is used as a search template to scan a query sequence. Found motifs are assigned scores depending on the non-randomness of the motif's occurrence, the number of matching searching motifs and the number of mismatches. The non-randomness is estimated by comparison of observed numbers of matching motifs with those predicted to occur by chance. The latter can be calculated given the base compositions of the motif and the query sequence. The method does not require preliminary alignment of the input motifs, hence avoiding uncertainties introduced by the alignment procedure. In comparison with PWM-based tools, our method demonstrates higher precision by the same sensitivity and specificity. It also tends to outperform methods combining pattern and PWM search. Most important, it allows reducing the number of false positive predictions significantly.

Availability: The method is implemented in a tool called SiTaR (Site Tracking and Recognition) and is available at http://sbi.hki-jena.de/sitar/index.php.

Contact: ekaterina.shelest@hki-jena.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Validation Study

MeSH terms

  • Binding Sites
  • Nucleotide Motifs
  • Promoter Regions, Genetic*
  • Sensitivity and Specificity
  • Sequence Analysis, DNA*
  • Software*
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors