Quantification, Dynamic Visualization, and Validation of Bias in ATAC-Seq Data with ataqv

Cell Syst. 2020 Mar 25;10(3):298-306.e4. doi: 10.1016/j.cels.2020.02.009.

Abstract

The assay for transposase-accessible chromatin using sequencing (ATAC-seq) has become the preferred method for mapping chromatin accessibility due to its time and input material efficiency. However, it can be difficult to evaluate data quality and identify sources of technical bias across samples. Here, we present ataqv, a computational toolkit for efficiently measuring, visualizing, and comparing quality control (QC) results across samples and experiments. We use ataqv to analyze 2,009 public ATAC-seq datasets; their QC metrics display a 10-fold range. Tn5 dosage experiments and statistical modeling show that technical variation in the ratio of Tn5 transposase to nuclei and sequencing flowcell density induces systematic bias in ATAC-seq data by changing the enrichment of reads across functional genomic annotations including promoters, enhancers, and transcription-factor-bound regions, with the notable exception of CTCF. ataqv can be integrated into existing computational pipelines and is freely available at https://github.com/ParkerLab/ataqv/.

Keywords: bioinformatics; chromatin; computational genomics; epigenome; tools.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bias
  • Chromatin / genetics
  • Chromatin Immunoprecipitation Sequencing / methods*
  • Computational Biology / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Promoter Regions, Genetic / genetics
  • Quality Control
  • Regulatory Sequences, Nucleic Acid / genetics
  • Sequence Analysis, DNA / methods*
  • Software
  • Transcription Factors / genetics
  • Transposases / genetics
  • Transposases / metabolism

Substances

  • Chromatin
  • Tn5 transposase
  • Transcription Factors
  • Transposases