Genome characterization based on the Spike-614 and NS8-84 loci of SARS-CoV-2 reveals two major possible onsets of the COVID-19 pandemic

PLoS One. 2023 Jun 15;18(6):e0279221. doi: 10.1371/journal.pone.0279221. eCollection 2023.

Abstract

The global COVID-19 pandemic has lasted for 3 years since its outbreak, however its origin is still unknown. Here, we analyzed the genotypes of 3.14 million SARS-CoV-2 genomes based on the amino acid 614 of the Spike (S) and the amino acid 84 of NS8 (nonstructural protein 8), and identified 16 linkage haplotypes. The GL haplotype (S_614G and NS8_84L) was the major haplotype driving the global pandemic and accounted for 99.2% of the sequenced genomes, while the DL haplotype (S_614D and NS8_84L) caused the pandemic in China in the spring of 2020 and accounted for approximately 60% of the genomes in China and 0.45% of the global genomes. The GS (S_614G and NS8_84S), DS (S_614D and NS8_84S), and NS (S_614N and NS8_84S) haplotypes accounted for 0.26%, 0.06%, and 0.0067% of the genomes, respectively. The main evolutionary trajectory of SARS-CoV-2 is DS→DL→GL, whereas the other haplotypes are minor byproducts in the evolution. Surprisingly, the newest haplotype GL had the oldest time of most recent common ancestor (tMRCA), which was May 1 2019 by mean, while the oldest haplotype DS had the newest tMRCA with a mean of October 17, indicating that the ancestral strains that gave birth to GL had been extinct and replaced by the more adapted newcomer at the place of its origin, just like the sequential rise and fall of the delta and omicron variants. However, the haplotype DL arrived and evolved into toxic strains and ignited a pandemic in China where the GL strains had not arrived in by the end of 2019. The GL strains had spread all over the world before they were discovered, and ignited the global pandemic, which had not been noticed until the virus was declared in China. However, the GL haplotype had little influence in China during the early phase of the pandemic due to its late arrival as well as the strict transmission controls in China. Therefore, we propose two major onsets of the COVID-19 pandemic, one was mainly driven by the haplotype DL in China, the other was driven by the haplotype GL globally.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / genetics
  • COVID-19* / epidemiology
  • COVID-19* / genetics
  • Genome, Viral / genetics
  • Humans
  • Pandemics
  • Phylogeny
  • SARS-CoV-2 / genetics
  • SARS-CoV-2 / metabolism
  • Spike Glycoprotein, Coronavirus / chemistry

Substances

  • Amino Acids
  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2

Supplementary concepts

  • SARS-CoV-2 variants

Grants and funding

This research was supported by grants from National Key R&D Program of China and the Central Public-interest Scientific Institution Basal Research Fund to J.Z. (1630052020022), and the Project of Science and Technology Department of Sichuan Provincial of China to L.Y. (2019JDJQ0035). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.