Molecular evolutionary characteristics of SARS-CoV-2 emerging in the United States

J Med Virol. 2022 Jan;94(1):310-317. doi: 10.1002/jmv.27331. Epub 2021 Sep 20.

Abstract

SARS-CoV-2 is a newly discovered beta coronavirus at the end of 2019, which is highly pathogenic and poses a serious threat to human health. In this paper, 1875 SARS-CoV-2 whole genome sequences and the sequence coding spike protein (S gene) sampled from the United States were used for bioinformatics analysis to study the molecular evolutionary characteristics of its genome and spike protein. The MCMC method was used to calculate the evolution rate of the whole genome sequence and the nucleotide mutation rate of the S gene. The results showed that the nucleotide mutation rate of the whole genome was 6.677 × 10-4 substitution per site per year, and the nucleotide mutation rate of the S gene was 8.066 × 10-4 substitution per site per year, which was at a medium level compared with other RNA viruses. Our findings confirmed the scientific hypothesis that the rate of evolution of the virus gradually decreases over time. We also found 13 statistically significant positive selection sites in the SARS-CoV-2 genome. In addition, the results showed that there were 101 nonsynonymous mutation sites in the amino acid sequence of S protein, including seven putative harmful mutation sites. This paper has preliminarily clarified the evolutionary characteristics of SARS-CoV-2 in the United States, providing a scientific basis for future surveillance and prevention of virus variants.

Keywords: SARS-CoV-2; bioinformatics; molecular evolution; spike protein.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics
  • COVID-19 / epidemiology*
  • COVID-19 / pathology
  • Computational Biology
  • Evolution, Molecular*
  • Genome, Viral / genetics*
  • Humans
  • Mutation Rate
  • SARS-CoV-2 / genetics*
  • Spike Glycoprotein, Coronavirus / genetics*
  • United States / epidemiology
  • Whole Genome Sequencing

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2