Normalizing Metagenomic Hi-C Data and Detecting Spurious Contacts Using Zero-Inflated Negative Binomial Regression

J Comput Biol. 2022 Feb;29(2):106-120. doi: 10.1089/cmb.2021.0439. Epub 2022 Jan 12.

Abstract

High-throughput chromosome conformation capture (Hi-C) has recently been applied to natural microbial communities and revealed great potential to study multiple genomes simultaneously. Several extraneous factors may influence chromosomal contacts rendering the normalization of Hi-C contact maps essential for downstream analyses. However, the current paucity of metagenomic Hi-C normalization methods and the ignorance for spurious interspecies contacts weaken the interpretability of the data. Here, we report on two types of biases in metagenomic Hi-C experiments: explicit biases and implicit biases, and introduce HiCzin, a parametric model to correct both types of biases and remove spurious interspecies contacts. We demonstrate that the normalized metagenomic Hi-C contact maps by HiCzin result in lower biases, higher capability to detect spurious contacts, and better performance in metagenomic contig clustering.

Keywords: metagenomic Hi-C; normalization; spurious contact detection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bias
  • Chromosomes / genetics
  • Computational Biology
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Linear Models
  • Logistic Models
  • Metagenome
  • Metagenomics / statistics & numerical data*
  • Microbiota / genetics
  • Regression Analysis
  • Software
  • Yeasts / genetics