Identification and quantification of defective virus genomes in high throughput sequencing data using DVG-profiler, a novel post-sequence alignment processing algorithm

PLoS One. 2019 May 17;14(5):e0216944. doi: 10.1371/journal.pone.0216944. eCollection 2019.

Abstract

Most viruses are known to spontaneously generate defective viral genomes (DVG) due to errors during replication. These DVGs are subgenomic and contain deletions that render them unable to complete a full replication cycle in the absence of a co-infecting, non-defective helper virus. DVGs, especially of the copyback type, frequently observed with paramyxoviruses, have been recognized to be important triggers of the antiviral innate immune response. DVGs have therefore gained interest for their potential to alter the attenuation and immunogenicity of vaccines. To investigate this potential, accurate identification and quantification of DVGs is essential. Conventional methods, such as RT-PCR, are labor intensive and will only detect primer sequence-specific species. High throughput sequencing (HTS) is much better suited for this undertaking. Here, we present an HTS-based algorithm called DVG-profiler to identify and quantify all DVG sequences in an HTS data set generated from a virus preparation. DVG-profiler identifies DVG breakpoints relative to a reference genome and reports the directionality of each segment from within the same read. The specificity and sensitivity of the algorithm was assessed using both in silico data sets as well as HTS data obtained from parainfluenza virus 5, Sendai virus and mumps virus preparations. HTS data from the latter were also compared with conventional RT-PCR data and with data obtained using an alternative algorithm. The data presented here demonstrate the high specificity, sensitivity, and robustness of DVG-profiler. This algorithm was implemented within an open source cloud-based computing environment for analyzing HTS data. DVG-profiler might prove valuable not only in basic virus research but also in monitoring live attenuated vaccines for DVG content and to assure vaccine lot to lot consistency.

MeSH terms

  • Algorithms*
  • Animals
  • Chromosome Mapping / methods
  • Chromosome Mapping / statistics & numerical data*
  • DNA Primers / chemical synthesis
  • DNA Primers / metabolism
  • Datasets as Topic
  • Defective Viruses / classification
  • Defective Viruses / genetics*
  • Genome, Viral*
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Molecular Typing
  • Mumps virus / classification
  • Mumps virus / genetics*
  • Parainfluenza Virus 5 / classification
  • Parainfluenza Virus 5 / genetics*
  • Real-Time Polymerase Chain Reaction
  • Sendai virus / classification
  • Sendai virus / genetics*
  • Sensitivity and Specificity

Substances

  • DNA Primers

Grants and funding

The authors received no specific funding for this work.