Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq data

PLoS One. 2020 Apr 30;15(4):e0232332. doi: 10.1371/journal.pone.0232332. eCollection 2020.

Abstract

The assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) is an inexpensive protocol for measuring open chromatin regions. ATAC-seq is also relatively simple and requires fewer cells than many other high-throughput sequencing protocols. Therefore, it is tractable in numerous settings where other high throughput assays are challenging to impossible. Hence it is important to understand the limits of what can be inferred from ATAC-seq data. In this work, we leverage ATAC-seq to predict the presence of nascent transcription. Nascent transcription assays are the current gold standard for identifying regions of active transcription, including markers for functional transcription factor (TF) binding. We combine mapped short reads from ATAC-seq with the underlying peak sequence, to determine regions of active transcription genome-wide. We show that a hybrid signal/sequence representation classified using recurrent neural networks (RNNs) can identify these regions across different cell types.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • A549 Cells
  • DNA-Directed RNA Polymerases / metabolism*
  • HCT116 Cells
  • Humans
  • MCF-7 Cells
  • Neural Networks, Computer
  • Nucleotide Motifs
  • Protein Binding
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / metabolism
  • Transcription Initiation Site*

Substances

  • Transcription Factors
  • DNA-Directed RNA Polymerases