Visualization of the landscape of the read alignment shape of ATAC-seq data using Hellinger distance metric

Genes Cells. 2024 Jan;29(1):5-16. doi: 10.1111/gtc.13082. Epub 2023 Nov 21.

Abstract

Assay for Transposase-Accessible Chromatin using high-throughput sequencing (ATAC-seq) is the popular technique using next-generation sequencing to measure chromatin accessibility and identify open chromatin regions. While read alignment shape information of next-generation sequencing data with intensity information has been used in various bioinformatics methods, few studies have focused on pure shape information alone. In this study, we investigated what types of ATAC-seq read alignment shapes are observed for the promoter region and whether the pure shape information was related or unrelated to other gene features. We introduced a novel concept and pipeline for handling the pure shape information of NGS data as probability distributions and quantifying their dissimilarities by information theory. Based on this concept, we demonstrate that the pure shape information of ATAC-seq data is correlated with chromatin openness and some gene characteristics. On the other hand, it is suggested that the pure information of ATAC-seq read alignment shape is unlikely to contain additional information to explain differences in RNA expression. Our study suggests that viewing the read alignment shape of NGS data as probability distributions enables us to capture the characteristics of the genome-wide landscape of such data in a non-parametric manner.

Keywords: ATAC-seq; clustering; information theory; shape.

MeSH terms

  • Chromatin Immunoprecipitation Sequencing*
  • Chromatin* / genetics
  • Genome
  • High-Throughput Nucleotide Sequencing / methods
  • Sequence Analysis, DNA / methods

Substances

  • Chromatin