A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells

Genome Res. 2021 Apr;31(4):564-575. doi: 10.1101/gr.272468.120. Epub 2021 Mar 12.

Abstract

Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Binding Sites
  • Embryonic Stem Cells / metabolism*
  • Enhancer Elements, Genetic*
  • Humans
  • Mice
  • Pluripotent Stem Cells / metabolism
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors

Grants and funding