Assessment of positive selection across SARS-CoV-2 variants via maximum likelihood

PLoS One. 2023 Sep 14;18(9):e0291271. doi: 10.1371/journal.pone.0291271. eCollection 2023.

Abstract

Study of the genome of the SARS-CoV-2 virus, particularly with regard to understanding evolution of the virus, is crucial for managing the COVID-19 pandemic. To this end, we sample viral genomes from the GISAID repository and use several of the maximum likelihood approaches implemented in PAML, a collection of open source programs for phylogenetic analyses of DNA and protein sequences, to assess evidence for positive selection in the protein-coding regions of the SARS-CoV-2 genome. Across all major variants identified by June 2021, we find limited evidence for positive selection. In particular, we identify positive selection in a small proportion of sites (5-15%) in the protein-coding region of the spike protein across variants. Most other variants did not show a strong signal for positive selection overall, though there were indications of positive selection in the Delta and Kappa variants for the nucleocapsid protein. We additionally use a forward selection procedure to fit a model that allows branch-specific estimates of selection along a phylogeny relating the variants, and find that there is variation in the selective pressure across variants for the spike protein. Our results highlight the utility of computational approaches for identifying genomic regions under selection.

MeSH terms

  • COVID-19* / genetics
  • Humans
  • Likelihood Functions
  • Pandemics
  • Phylogeny
  • SARS-CoV-2* / genetics
  • Spike Glycoprotein, Coronavirus / genetics

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2

Supplementary concepts

  • SARS-CoV-2 variants

Grants and funding

The authors received no specific funding for this work.