Uncovering tissue-specific binding features from differential deep learning

Nucleic Acids Res. 2020 Mar 18;48(5):e27. doi: 10.1093/nar/gkaa009.

Abstract

Transcription factors (TFs) can bind DNA in a cooperative manner, enabling a mutual increase in occupancy. Through this type of interaction, alternative binding sites can be preferentially bound in different tissues to regulate tissue-specific expression programmes. Recently, deep learning models have become state-of-the-art in various pattern analysis tasks, including applications in the field of genomics. We therefore investigate the application of convolutional neural network (CNN) models to the discovery of sequence features determining cooperative and differential TF binding across tissues. We analyse ChIP-seq data from MEIS, TFs which are broadly expressed across mouse branchial arches, and HOXA2, which is expressed in the second and more posterior branchial arches. By developing models predictive of MEIS differential binding in all three tissues, we are able to accurately predict HOXA2 co-binding sites. We evaluate transfer-like and multitask approaches to regularizing the high-dimensional classification task with a larger regression dataset, allowing for the creation of deeper and more accurate models. We test the performance of perturbation and gradient-based attribution methods in identifying the HOXA2 sites from differential MEIS data. Our results show that deep regularized models significantly outperform shallow CNNs as well as k-mer methods in the discovery of tissue-specific sites bound in vivo.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Binding Sites
  • Branchial Region / growth & development
  • Branchial Region / metabolism*
  • Chromatin Immunoprecipitation
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Deep Learning*
  • Embryo, Mammalian
  • Gene Expression Regulation, Developmental
  • High-Throughput Nucleotide Sequencing
  • Homeodomain Proteins / genetics*
  • Homeodomain Proteins / metabolism
  • Mice
  • Models, Genetic
  • Myeloid Ecotropic Viral Integration Site 1 Protein / genetics*
  • Myeloid Ecotropic Viral Integration Site 1 Protein / metabolism
  • Organ Specificity
  • Poisson Distribution
  • Protein Binding
  • Protein Isoforms / genetics
  • Protein Isoforms / metabolism
  • RNA / genetics*
  • RNA / metabolism

Substances

  • Homeodomain Proteins
  • Hoxa2 protein, mouse
  • Meis1 protein, mouse
  • Myeloid Ecotropic Viral Integration Site 1 Protein
  • Protein Isoforms
  • RNA