Scanning sequences after Gibbs sampling to find multiple occurrences of functional elements

BMC Bioinformatics. 2006 Sep 8:7:408. doi: 10.1186/1471-2105-7-408.

Abstract

Background: Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set.

Results: We describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence.

Conclusion: Datasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Intramural

MeSH terms

  • Animals
  • Base Sequence
  • Drosophila / genetics
  • Markov Chains*
  • Molecular Sequence Data
  • Monte Carlo Method*
  • Regulatory Elements, Transcriptional / genetics*
  • Saccharomyces / genetics
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*