A widespread role of the motif environment in transcription factor binding across diverse protein families

Genome Res. 2015 Sep;25(9):1268-80. doi: 10.1101/gr.184671.114. Epub 2015 Jul 9.

Abstract

Transcriptional regulation requires the binding of transcription factors (TFs) to short sequence-specific DNA motifs, usually located at the gene regulatory regions. Interestingly, based on a vast amount of data accumulated from genomic assays, it has been shown that only a small fraction of all potential binding sites containing the consensus motif of a given TF actually bind the protein. Recent in vitro binding assays, which exclude the effects of the cellular environment, also demonstrate selective TF binding. An intriguing conjecture is that the surroundings of cognate binding sites have unique characteristics that distinguish them from other sequences containing a similar motif that are not bound by the TF. To test this hypothesis, we conducted a comprehensive analysis of the sequence and DNA shape features surrounding the core-binding sites of 239 and 56 TFs extracted from in vitro HT-SELEX binding assays and in vivo ChIP-seq data, respectively. Comparing the nucleotide content of the regions around the TF-bound sites to the counterpart unbound regions containing the same consensus motifs revealed significant differences that extend far beyond the core-binding site. Specifically, the environment of the bound motifs demonstrated unique sequence compositions, DNA shape features, and overall high similarity to the core-binding motif. Notably, the regions around the binding sites of TFs that belong to the same TF families exhibited similar features, with high agreement between the in vitro and in vivo data sets. We propose that these unique features assist in guiding TFs to their cognate binding sites.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Base Composition
  • Base Sequence
  • Binding Sites*
  • Computational Biology / methods
  • Gene Expression Regulation
  • Genomics / methods
  • Humans
  • Nucleotide Motifs*
  • Regulatory Elements, Transcriptional
  • Regulatory Sequences, Nucleic Acid
  • SELEX Aptamer Technique
  • Transcription Factors / metabolism*
  • Transcription, Genetic

Substances

  • Transcription Factors