Base-resolution models of transcription-factor binding reveal soft motif syntax

Nat Genet. 2021 Mar;53(3):354-366. doi: 10.1038/s41588-021-00782-6. Epub 2021 Feb 18.

Abstract

The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Binding Sites
  • Chromatin Immunoprecipitation
  • Clustered Regularly Interspaced Short Palindromic Repeats
  • Computational Biology / methods*
  • Deep Learning
  • Mice
  • Mouse Embryonic Stem Cells / physiology
  • Nanog Homeobox Protein / metabolism
  • Neural Networks, Computer
  • Nucleotide Motifs*
  • Octamer Transcription Factor-3 / metabolism
  • Reproducibility of Results
  • SOXB1 Transcription Factors / metabolism
  • Transcription Factors / metabolism*

Substances

  • Nanog Homeobox Protein
  • Nanog protein, mouse
  • Octamer Transcription Factor-3
  • Pou5f1 protein, mouse
  • SOXB1 Transcription Factors
  • Sox2 protein, mouse
  • Transcription Factors