Conserved short sequences in promoter regions of human genome

J Biomol Struct Dyn. 2010 Apr;27(5):599-610. doi: 10.1080/07391102.2010.10508574.

Abstract

Recognition of promoter elements by the transcription factors is one of the early initial and crucial steps in gene expression and regulation. In prokaryotes, there are clear signals to identify the promoter regions like TATAAT at around -10 and TTGACA at -35 positions from transcription start site (TSS). In eukaryotes the promoter regions are structurally more complex and there are no conserved or consensus sequences similar to the ones found in prokaryotic promoters. We have located a set of GC rich short sequences (< 8 nt) that are relatively common in human promoter sequences around the TSS (+/- 100 relative to TSS). These sequences were sorted based on their frequency of occurrence in the database and the most common 50 sequences were used for further studies. Sigmoidal behavior of the high end of the frequency distribution of these sequences suggests presence of some internal co-operativity. These short sequences are distributed on both sides of TSS, suggesting that probably the transcription factors recognize these sequences on both upstream and downstream of TSS. As eukaryotic promoters lack any conserved sequences, we expect that these short sequences may help in recognition of promoter regions by relevant transcription factors prior to the initiation of transcription process. We postulate that a cluster of genes with common short sequences in the promoter region can be recognized by a particular transcription factor. We also found that most of these short sequences are fairly common within miRNA (both mature and stem-loop sequences). Our studies indicate that eukaryotic transcription is more complex than currently believed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Conserved Sequence*
  • Databases, Nucleic Acid
  • Genome, Human / genetics*
  • Humans
  • Inverted Repeat Sequences / genetics
  • MicroRNAs / genetics
  • Molecular Sequence Data
  • Nucleotides / genetics
  • Promoter Regions, Genetic*
  • Transcription Initiation Site

Substances

  • MicroRNAs
  • Nucleotides