Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies

Elife. 2019 Jan 31:8:e38070. doi: 10.7554/eLife.38070.

Abstract

Current Hi-C analysis approaches are unable to account for reads that align to multiple locations, and hence underestimate biological signal from repetitive regions of genomes. We developed and validated mHi-C, a multi-read mapping strategy to probabilistically allocate Hi-C multi-reads. mHi-C exhibited superior performance over utilizing only uni-reads and heuristic approaches aimed at rescuing multi-reads on benchmarks. Specifically, mHi-C increased the sequencing depth by an average of 20% resulting in higher reproducibility of contact matrices and detected interactions across biological replicates. The impact of the multi-reads on the detection of significant interactions is influenced marginally by the relative contribution of multi-reads to the sequencing depth compared to uni-reads, cis-to-trans ratio of contacts, and the broad data quality as reflected by the proportion of mappable reads of datasets. Computational experiments highlighted that in Hi-C studies with short read lengths, mHi-C rescued multi-reads can emulate the effect of longer reads. mHi-C also revealed biologically supported bona fide promoter-enhancer interactions and topologically associating domains involving repetitive genomic regions, thereby unlocking a previously masked portion of the genome for conformation capture studies.

Keywords: Hi-C; chromosome chromatin capture; computational biology; human; mouse; multi-reads; probabilistic modeling; systems biology.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • Chromatin / genetics*
  • Computer Simulation
  • Enhancer Elements, Genetic / genetics
  • Genomics / methods*
  • Humans
  • Probability
  • Promoter Regions, Genetic
  • Reproducibility of Results

Substances

  • Chromatin

Associated data

  • Dryad/10.5061/dryad.v7k3140
  • GEO/GSE43070
  • GEO/GSE50199
  • GEO/GSE63525
  • GEO/GSE35156
  • GEO/GSE92819
  • GEO/GSE96107