Integrative prediction of gene expression with chromatin accessibility and conformation data

Epigenetics Chromatin. 2020 Feb 6;13(1):4. doi: 10.1186/s13072-020-0327-0.

Abstract

Background: Enhancers play a fundamental role in orchestrating cell state and development. Although several methods have been developed to identify enhancers, linking them to their target genes is still an open problem. Several theories have been proposed on the functional mechanisms of enhancers, which triggered the development of various methods to infer promoter-enhancer interactions (PEIs). The advancement of high-throughput techniques describing the three-dimensional organization of the chromatin, paved the way to pinpoint long-range PEIs. Here we investigated whether including PEIs in computational models for the prediction of gene expression improves performance and interpretability.

Results: We have extended our [Formula: see text] framework to include DNA contacts deduced from chromatin conformation capture experiments and compared various methods to determine PEIs using predictive modelling of gene expression from chromatin accessibility data and predicted transcription factor (TF) motif data. We designed a novel machine learning approach that allows the prioritization of TFs binding to distal loop and promoter regions with respect to their importance for gene expression regulation. Our analysis revealed a set of core TFs that are part of enhancer-promoter loops involving YY1 in different cell lines.

Conclusion: We present a novel approach that can be used to prioritize TFs involved in distal and promoter-proximal regulatory events by integrating chromatin accessibility, conformation, and gene expression data. We show that the integration of chromatin conformation data can improve gene expression prediction and aids model interpretability.

Keywords: Chromatin accessibility; Chromatin conformation; DNase1-seq; Gene expression prediction; Gene regulation; HiC; HiChIP; Machine learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Chromatin / chemistry*
  • Chromatin / genetics
  • Chromatin Assembly and Disassembly*
  • Enhancer Elements, Genetic*
  • Genomics / methods*
  • HCT116 Cells
  • HeLa Cells
  • Human Umbilical Vein Endothelial Cells / metabolism
  • Humans
  • Jurkat Cells
  • K562 Cells
  • Machine Learning
  • Protein Binding
  • Transcription Factors / chemistry
  • Transcription Factors / metabolism

Substances

  • Chromatin
  • Transcription Factors