Genome-Wide Variation in Betacoronaviruses

J Virol. 2021 Jul 12;95(15):e0049621. doi: 10.1128/JVI.00496-21. Epub 2021 Jul 12.

Abstract

The Severe acute respiratory syndrome coronavirus (SARS-CoV) and SARS-CoV-2 originated in bats and adapted to infect humans. Several SARS-CoV-2 strains have been identified. Genetic variation is fundamental to virus evolution and, in response to selection pressure, is manifested as the emergence of new strains and species adapted to different hosts or with novel pathogenicity. The combination of variation and selection forms a genetic footprint on the genome, consisting of the preferential accumulation of mutations in particular areas. Properties of betacoronaviruses contributing to variation and the emergence of new strains and species are beginning to be elucidated. To better understand their variation, we profiled the accumulation of mutations in all species in the genus Betacoronavirus, including SARS-CoV-2 and two other species that infect humans: SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV). Variation profiles identified both genetically stable and variable areas at homologous locations across species within the genus Betacoronavirus. The S glycoprotein is the most variable part of the genome and is structurally disordered. Other variable parts include proteins 3 and 7 and ORF8, which participate in replication and suppression of antiviral defense. In contrast, replication proteins in ORF1b are the least variable. Collectively, our results show that variation and structural disorder in the S glycoprotein is a general feature of all members of the genus Betacoronavirus, including SARS-CoV-2. These findings highlight the potential for the continual emergence of new species and strains with novel biological properties and indicate that the S glycoprotein has a critical role in host adaptation. IMPORTANCE Natural infection with SARS-CoV-2 and vaccines triggers the formation of antibodies against the S glycoprotein, which are detected by antibody-based diagnostic tests. Our analysis showed that variation in the S glycoprotein is a general feature of all species in the genus Betacoronavirus, including three species that infect humans: SARS-CoV, SARS-CoV-2, and MERS-CoV. The variable nature of the S glycoprotein provides an explanation for the emergence of SARS-CoV-2, the differentiation of SARS-CoV-2 into strains, and the probability of SARS-CoV-2 repeated infections in people. Variation of the S glycoprotein also has important implications for the reliability of SARS-CoV-2 antibody-based diagnostic tests and the design and deployment of vaccines and antiviral drugs. These findings indicate that adjustments to vaccine design and deployment and to antibody-based diagnostic tests are necessary to account for S glycoprotein variation.

Keywords: COVID-19; MERS-CoV; S protein; SARS-CoV; SARS-CoV-2; coronavirus; genomic variation; glycoprotein S; protein S; vaccine.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Betacoronavirus / genetics*
  • Evolution, Molecular*
  • Genetic Variation*
  • Genome, Viral*
  • Genome-Wide Association Study
  • Humans
  • Spike Glycoprotein, Coronavirus / genetics*

Substances

  • Spike Glycoprotein, Coronavirus