Nucleotide composition-linked divergence of vertebrate core promoter architecture

Genome Res. 2011 Mar;21(3):410-21. doi: 10.1101/gr.111724.110. Epub 2011 Jan 10.

Abstract

Transcription initiation involves the recruitment of basal transcription factors to the core promoter. A variety of core promoter elements exists; however for most of these motifs, the distribution across species is unknown. Here we report on the comparison of human and amphibian promoter sequences. We have used oligo-capping in combination with deep sequencing to determine transcription start sites in Xenopus tropicalis. To systematically predict regulatory elements, we have developed a de novo motif finding pipeline using an ensemble of computational tools. A comprehensive comparison of human and amphibian promoter sequences revealed both similarities and differences in core promoter architecture. Some of the differences stem from a highly divergent nucleotide composition of Xenopus and human promoters. Whereas the distribution of some core promoter motifs is conserved independently of species-specific nucleotide bias, the frequency of another class of motifs correlates with the single nucleotide frequencies. This class includes the well-known TATA box and SP1 motifs, which are more abundant in Xenopus and human promoters, respectively. While these motifs are enriched above the local nucleotide background in both organisms, their frequency varies in step with this background. These differences are likely adaptive as these motifs can recruit TFIID to either CpG island or sharply initiating promoters. Our results highlight both the conserved and diverged aspects of vertebrate transcription, most notably showing co-opted motif usage to recruit the transcriptional machinery to promoters with diverging nucleotide composition. This shows how sweeping changes in nucleotide composition are compatible with highly conserved mechanisms of transcription initiation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adaptation, Biological
  • Animals
  • Base Sequence
  • Conserved Sequence*
  • CpG Islands
  • Female
  • Genetic Variation
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis
  • Polymorphism, Single Nucleotide
  • Sequence Homology, Nucleic Acid
  • TATA Box
  • Transcription Factor TFIID / genetics
  • Transcription Factor TFIID / metabolism
  • Transcription Initiation Site
  • Transcription, Genetic*
  • Xenopus

Substances

  • Transcription Factor TFIID

Associated data

  • GEO/GSE21482