Exploring the characteristics of sequence elements in proximal promoters of human genes

Genomics. 2004 Dec;84(6):929-40. doi: 10.1016/j.ygeno.2004.08.013.

Abstract

Central to reconstruction of cis-regulatory networks is identification and classification of naturally occurring transcription factor-binding sites according to the genes that they control. We have examined salient characteristics of 9-mers that occur in various orders and combinations in the proximal promoters of human genes. In evaluations of a dataset derived with respect to experimentally defined transcription initiation sites, in some cases we observed a clear correspondence of highly ranked 9-mers with protein-binding sites in genomic DNA. Evaluations of the larger dataset, derived with respect to the 5' end of human ESTs, revealed that a subset of the highly ranked 9-mers corresponded to sites for several known transcription factor families (including CREB, ETS, EGR-1, SP1, KLF, MAZ, HIF-1, and STATs) that play important roles in the regulation of vertebrate genes. We identified several highly ranked CpG-containing 9-mers, defining sites for interactions with the CREB and ETS families of proteins, and identified potential target genes for these proteins. The results of the studies imply that the CpG-containing transcription factor-binding sites regulate the expression of genes with important roles in pathways leading to cell-type-specific gene expression and pathways controlled by the complex networks of signaling systems.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • 5' Untranslated Regions / genetics
  • Base Sequence
  • Binding Sites
  • Computational Biology*
  • Databases, Genetic
  • Gene Expression Regulation*
  • Humans
  • Molecular Sequence Data
  • Promoter Regions, Genetic / genetics*
  • Protein Binding
  • Response Elements / genetics*
  • Transcription Factors / metabolism
  • Transcription Initiation Site*
  • Transcription, Genetic / genetics*

Substances

  • 5' Untranslated Regions
  • Transcription Factors