Frequency distribution of TATA Box and extension sequences on human promoters

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-7-S4-S2.

Abstract

Background: TATA box is one of the most important transcription factor binding sites. But the exact sequences of TATA box are still not very clear.

Results: In this study, we conduct a dedicated analysis on the frequency distribution of TATA Box and its extension sequences on human promoters. Sixteen TATA elements derived from the TATA Box motif, TATAWAWN, are classified into three distribution patterns: peak, bottom-peak, and bottom. Fourteen TATA extension sequences are predicted to be the new TATA Box elements due to their high motif factors, which indicate their statistical significance. Statistical analysis on the promoters of mice, zebrafish and drosophila melanogaster verifies seven of these elements. It is also observed that the distribution of TATA elements on the promoters of housekeeping genes are very similar with their distribution on the promoters of tissue specific genes in human.

Conclusion: The dedicated statistical analysis on TATA box and its extension sequences yields new TATA elements. The statistical significance of these elements has been verified on random data sets by calculating their p values.

MeSH terms

  • Base Sequence
  • Binding Sites
  • Data Interpretation, Statistical
  • Gene Frequency / genetics
  • Humans
  • Molecular Sequence Data
  • Promoter Regions, Genetic / genetics*
  • Protein Binding
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Statistical Distributions
  • TATA Box / genetics*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors