Intra-Host Co-Existing Strains of SARS-CoV-2 Reference Genome Uncovered by Exhaustive Computational Search

Viruses. 2023 Apr 26;15(5):1065. doi: 10.3390/v15051065.

Abstract

The COVID-19 pandemic caused by SARS-CoV-2 has had a severe impact on people worldwide. The reference genome of the virus has been widely used as a template for designing mRNA vaccines to combat the disease. In this study, we present a computational method aimed at identifying co-existing intra-host strains of the virus from RNA-sequencing data of short reads that were used to assemble the original reference genome. Our method consisted of five key steps: extraction of relevant reads, error correction for the reads, identification of within-host diversity, phylogenetic study, and protein binding affinity analysis. Our study revealed that multiple strains of SARS-CoV-2 can coexist in both the viral sample used to produce the reference sequence and a wastewater sample from California. Additionally, our workflow demonstrated its capability to identify within-host diversity in foot-and-mouth disease virus (FMDV). Through our research, we were able to shed light on the binding affinity and phylogenetic relationships of these strains with the published SARS-CoV-2 reference genome, SARS-CoV, variants of concern (VOC) of SARS-CoV-2, and some closely related coronaviruses. These insights have important implications for future research efforts aimed at identifying within-host diversity, understanding the evolution and spread of these viruses, as well as the development of effective treatments and vaccines against them.

Keywords: SARS-CoV-2; de novo assembly; error correction; phylogeny; spike protein; within-host diversity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • COVID-19*
  • Genome, Viral
  • Humans
  • Pandemics
  • Phylogeny
  • SARS-CoV-2* / genetics
  • Spike Glycoprotein, Coronavirus / genetics

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2

Grants and funding

This research was partially supported by Australia Research Council Discovery Project DP180100120.