Protocol variations in run-on transcription dataset preparation produce detectable signatures in sequencing libraries

BMC Genomics. 2022 Mar 7;23(1):187. doi: 10.1186/s12864-022-08352-8.

Abstract

Background: A variety of protocols exist for producing whole genome run-on transcription datasets. However, little is known about how differences between these protocols affect the signal within the resulting libraries.

Results: Using run-on transcription datasets generated from the same biological system, we show that a variety of GRO- and PRO-seq preparation methods leave identifiable signatures within each library. Specifically we show that the library preparation method results in differences in quality control metrics, as well as differences in the signal distribution at the 5 ' end of transcribed regions. These shifts lead to disparities in eRNA identification, but do not impact analyses aimed at inferring the key regulators involved in changes to transcription.

Conclusions: Run-on sequencing protocol variations result in technical signatures that can be used to identify both the enrichment and library preparation method of a particular data set. These technical signatures are batch effects that limit detailed comparisons of pausing ratios and eRNAs identified across protocols. However, these batch effects have only limited impact on our ability to infer which regulators underlie the observed transcriptional changes.

Keywords: GRO-seq; Library preparation; PRO-seq; Run-on sequencing.

MeSH terms

  • Databases, Genetic
  • Genomic Library*
  • High-Throughput Nucleotide Sequencing* / methods
  • Quality Control
  • Transcription, Genetic