Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts

Genome Res. 2014 Jun;24(6):999-1011. doi: 10.1101/gr.160374.113. Epub 2014 Feb 5.

Abstract

Our current understanding of how DNA is packed in the nucleus is most accurate at the fine scale of individual nucleosomes and at the large scale of chromosome territories. However, accurate modeling of DNA architecture at the intermediate scale of ∼50 kb-10 Mb is crucial for identifying functional interactions among regulatory elements and their target promoters. We describe a method, Fit-Hi-C, that assigns statistical confidence estimates to mid-range intra-chromosomal contacts by jointly modeling the random polymer looping effect and previously observed technical biases in Hi-C data sets. We demonstrate that our proposed approach computes accurate empirical null models of contact probability without any distribution assumption, corrects for binning artifacts, and provides improved statistical power relative to a previously described method. High-confidence contacts identified by Fit-Hi-C preferentially link expressed gene promoters to active enhancers identified by chromatin signatures in human embryonic stem cells (ESCs), capture 77% of RNA polymerase II-mediated enhancer-promoter interactions identified using ChIA-PET in mouse ESCs, and confirm previously validated, cell line-specific interactions in mouse cortex cells. We observe that insulators and heterochromatin regions are hubs for high-confidence contacts, while promoters and strong enhancers are involved in fewer contacts. We also observe that binding peaks of master pluripotency factors such as NANOG and POU5F1 are highly enriched in high-confidence contacts for human ESCs. Furthermore, we show that pairs of loci linked by high-confidence contacts exhibit similar replication timing in human and mouse ESCs and preferentially lie within the boundaries of topological domains for human and mouse cell lines.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Chromatin / chemistry
  • Chromatin / genetics*
  • Chromatin Assembly and Disassembly*
  • Confidence Intervals
  • Embryonic Stem Cells / metabolism
  • Histone Code
  • Homeodomain Proteins / genetics
  • Homeodomain Proteins / metabolism
  • Humans
  • Mice
  • Models, Genetic*
  • Nanog Homeobox Protein
  • Neurons / metabolism
  • Octamer Transcription Factor-3 / genetics
  • Octamer Transcription Factor-3 / metabolism
  • Protein Binding
  • Regulatory Sequences, Nucleic Acid*
  • Species Specificity
  • Yeasts / genetics

Substances

  • Chromatin
  • Homeodomain Proteins
  • NANOG protein, human
  • Nanog Homeobox Protein
  • Octamer Transcription Factor-3
  • POU5F1 protein, human