Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data

Bioinformatics. 2015 Mar 15;31(6):960-2. doi: 10.1093/bioinformatics/btu747. Epub 2014 Nov 12.

Abstract

Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory.

Availability and implementation: The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Chromatin / chemistry
  • Chromatin / genetics*
  • Chromosome Mapping
  • Chromosomes, Human / chemistry
  • Chromosomes, Human / genetics*
  • Genome, Human*
  • Genomic Library
  • Humans
  • Models, Statistical
  • Software*

Substances

  • Chromatin