Accurate Quantification of Overlapping Herpesvirus Transcripts from RNA Sequencing Data

J Virol. 2022 Jan 26;96(2):e0163521. doi: 10.1128/JVI.01635-21. Epub 2021 Oct 27.

Abstract

Herpesviruses employ extensive bidirectional transcription of overlapping genes to overcome length constraints on their gene product repertoire. As a consequence, many lytic transcripts cannot be measured individually by reverse transcription-quantitative PCR (RT-qPCR) or conventional RNA sequencing (RNA-seq) analysis. A. G. Bruce, S. Barcy, T. DiMaio, E. Gan, et al. (Pathogens 6:11, 2017, https://doi.org/10.3390/pathogens6010011) have proposed an approximation method using unique coding sequences (UCDS) to estimate lytic gene abundance from Kaposi's sarcoma-associated herpesvirus (KSHV) RNA-seq data. Although UCDS has been widely employed, its accuracy, to our knowledge, has never been rigorously validated for any herpesvirus. In this study, we use cap analysis of gene expression sequencing (CAGE-seq) as a gold-standard to determine the accuracy of UCDS for estimating Epstein-Barr virus (EBV) lytic gene expression levels from RNA-seq data. We also introduce the Unique TranScript (UTS) method, which, like UCDS, estimates transcript abundance from changes in mean RNA-seq read depth. UTS is distinguished by its use of empirically determined 5' and 3' transcript ends rather than coding sequence annotations. Compared to conventional read assignment, both UCDS and UTS improved the accuracy of quantitation of overlapping genes, with UTS giving the most-accurate results. The UTS method discards fewer reads and may be advantageous for experiments with less sequencing depth. UTS is compatible with any aligner and, unlike isoform-aware alignment methods, can be implemented on a laptop computer. Our findings demonstrate that the accuracy achieved by complex and expensive techniques such as CAGE-seq can be approximated using conventional short-read RNA-seq data when read assignment methods address transcript overlap. Although our study focuses on EBV transcription, the UTS method should be applicable across all herpesviruses as well as to other genomes with extensively overlapping transcriptomes. IMPORTANCE Many viruses employ extensively overlapping transcript structures. This complexity makes it difficult to quantify gene expression by using conventional methods, including RNA-seq. Although high-throughput techniques that overcome these limitations exist, they are complex, expensive, and scarce in the herpesvirus literature relative to short-read RNA-seq. Here, using Epstein-Barr virus (EBV) as a model, we demonstrate that conventional RNA-seq analysis methods fail to accurately quantify the abundances of many overlapping transcripts. We further show that the previously described Unique CoDing Sequence (UCDS) method and our Unique TranScript (UTS) method greatly improve the accuracy of EBV lytic gene measurements obtained from RNA-seq data. The UTS method has the advantages of discarding fewer reads and being implementable on a laptop computer. Although this study focuses on EBV, the UCDS and UTS methods should be applicable across herpesviruses and for other viruses that make extensive use of overlapping transcription.

Keywords: EBV; RNA-seq; herpesvirus; transcription.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Epstein-Barr Virus Infections / genetics
  • Epstein-Barr Virus Infections / virology
  • Genome, Viral
  • Herpesviridae / genetics*
  • Herpesvirus 4, Human / genetics
  • Polyadenylation
  • RNA, Viral / genetics
  • Sequence Analysis, RNA / methods*
  • Transcription, Genetic*
  • Transcriptome / genetics
  • Viral Proteins / genetics

Substances

  • RNA, Viral
  • Viral Proteins