Identifying TAD-like domains on single-cell Hi-C data by graph embedding and changepoint detection

Bioinformatics. 2024 Mar 4;40(3):btae138. doi: 10.1093/bioinformatics/btae138.

Abstract

Motivation: Topologically associating domains (TADs) are fundamental building blocks of 3D genome. TAD-like domains in single cells are regarded as the underlying genesis of TADs discovered in bulk cells. Understanding the organization of TAD-like domains helps to get deeper insights into their regulatory functions. Unfortunately, it remains a challenge to identify TAD-like domains on single-cell Hi-C data due to its ultra-sparsity.

Results: We propose scKTLD, an in silico tool for the identification of TAD-like domains on single-cell Hi-C data. It takes Hi-C contact matrix as the adjacency matrix for a graph, embeds the graph structures into a low-dimensional space with the help of sparse matrix factorization followed by spectral propagation, and the TAD-like domains can be identified using a kernel-based changepoint detection in the embedding space. The results tell that our scKTLD is superior to the other methods on the sparse contact matrices, including downsampled bulk Hi-C data as well as simulated and experimental single-cell Hi-C data. Besides, we demonstrated the conservation of TAD-like domain boundaries at single-cell level apart from heterogeneity within and across cell types, and found that the boundaries with higher frequency across single cells are more enriched for architectural proteins and chromatin marks, and they preferentially occur at TAD boundaries in bulk cells, especially at those with higher hierarchical levels.

Availability and implementation: scKTLD is freely available at https://github.com/lhqxinghun/scKTLD.

MeSH terms

  • Chromatin*
  • Chromosomes*
  • Genome

Substances

  • Chromatin