The extent of molecular variation in novel SARS-CoV-2 after the six-month global spread

Infect Genet Evol. 2021 Jul:91:104800. doi: 10.1016/j.meegid.2021.104800. Epub 2021 Mar 5.

Abstract

The pandemic spread of Coronavirus Disease 2019 (COVID-19) is still ongoing since severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is identified as the etiologic pathogen late December 2019. After over six-month spread of COVID-19, SARS-CoV-2 causes critical threats to global public health and economy. The investigations on evolution and genotyping on genetic variations are of great importance, therefore, the present study characterized the molecular variation of SARS-CoV-2 by analyzing 4230 complete genome sequences from the worldwide samples collected during the first 6-month pandemic. Phylogenetic tree analysis with Neighbor-Joining and Maximum-Parsimony methods indicated that the haplotypes of SARS-CoV-2 genome sequences were classified into four clades with the unique nucleotide and amino acid changes: T27879C (ORF8 L84S) in clade 1 (25.34%), A23138G (spike D614G) in clade 2 (63.54%), G10818T (nsp6 L37F), C14540T (nsp12 T442I), and G25879T (ORF3a V251F) in clade 3 (2.58%), and miscellaneous changes in clade 4 (8.54%). Interestingly, subclade 2B with the amino acid changes at nsp2 T85I, Spike D614G, and ORF3a Q57H was firstly reported on March 4, 2020 in United States of America, becoming the most frequent sub-haplogroup in the world (36.21%) and America (45.81%). Subclade 1C with the amino acid changes at nsp13 P504L and ORF8 L84S was becoming the second most frequent sub-haplogroup in the world (19.91%) and America (26.29%). Subclade 2A with the amino acid changes in Spike D614G and Nucleocapsid R203K and G204R was highly prevalent in Asia (18.82%) and Europe (29.72%). The study highlights the notable clades and sub-clades with unique mutations, revealing the genetic and geographical relevant post the six-month outbreak of COVID-19. This study thoroughly observed the genetic feature of SARS-CoV-2 haplotyping, providing an epidemiological trend of COVID-19.

Keywords: COVID-19; Clade; Epidemiological trend; Genetic variation; Phylogenetic tree; SARS-CoV-2.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Americas / epidemiology
  • Amino Acid Substitution
  • Asia / epidemiology
  • COVID-19 / epidemiology*
  • COVID-19 / transmission
  • COVID-19 / virology
  • Europe / epidemiology
  • Evolution, Molecular
  • Gene Expression Regulation, Viral
  • Genetic Variation*
  • Genome, Viral*
  • Haplotypes
  • Humans
  • Mutation Rate
  • Nucleocapsid Proteins / genetics*
  • Open Reading Frames
  • Phylogeny
  • SARS-CoV-2 / classification
  • SARS-CoV-2 / genetics*
  • Selection, Genetic
  • Spike Glycoprotein, Coronavirus / genetics*

Substances

  • Nucleocapsid Proteins
  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2