Comparison of Capture Hi-C Analytical Pipelines

Dina Aljogol; I Richard Thompson; Cameron S Osborne; Borbala Mifsud

doi:10.3389/fgene.2022.786501

Comparison of Capture Hi-C Analytical Pipelines

Front Genet. 2022 Jan 28:13:786501. doi: 10.3389/fgene.2022.786501. eCollection 2022.

Authors

Dina Aljogol¹, I Richard Thompson², Cameron S Osborne³, Borbala Mifsud^{1

4}

Affiliations

¹ College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar.
² Qatar Biomedical Research Institute, Hamad Bin Khalifa University, Doha, Qatar.
³ Department of Medical and Molecular Genetics, King's College London, London, United Kingdom.
⁴ William Harvey Research Institute, Queen Mary University of London, London, United Kingdom.

Abstract

It is now evident that DNA forms an organized nuclear architecture, which is essential to maintain the structural and functional integrity of the genome. Chromatin organization can be systematically studied due to the recent boom in chromosome conformation capture technologies (e.g., 3C and its successors 4C, 5C and Hi-C), which is accompanied by the development of computational pipelines to identify biologically meaningful chromatin contacts in such data. However, not all tools are applicable to all experimental designs and all structural features. Capture Hi-C (CHi-C) is a method that uses an intermediate hybridization step to target and select predefined regions of interest in a Hi-C library, thereby increasing effective sequencing depth for those regions. It allows researchers to investigate fine chromatin structures at high resolution, for instance promoter-enhancer loops, but it introduces additional biases with the capture step, and therefore requires specialized pipelines. Here, we compare multiple analytical pipelines for CHi-C data analysis. We consider the effect of retaining multi-mapping reads and compare the efficiency of different statistical approaches in both identifying reproducible interactions and determining biologically significant interactions. At restriction fragment level resolution, the number of multi-mapping reads that could be rescued was negligible. The number of identified interactions varied widely, depending on the analytical method, indicating large differences in type I and type II error rates. The optimal pipeline depends on the project-specific tolerance level of false positive and false negative chromatin contacts.

Keywords: capture Hi-C; chromatin organization; computational pipeline; epigenetics; gene regulation.