Evolutionary analysis and lineage designation of SARS-CoV-2 genomes

Sci Bull (Beijing). 2021 Nov 30;66(22):2297-2311. doi: 10.1016/j.scib.2021.02.012. Epub 2021 Feb 6.

Abstract

The pandemic due to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent of coronavirus disease 2019 (COVID-19), has caused immense global disruption. With the rapid accumulation of SARS-CoV-2 genome sequences, however, thousands of genomic variants of SARS-CoV-2 are now publicly available. To improve the tracing of the viral genomes' evolution during the development of the pandemic, we analyzed single nucleotide variants (SNVs) in 121,618 high-quality SARS-CoV-2 genomes. We divided these viral genomes into two major lineages (L and S) based on variants at sites 8782 and 28144, and further divided the L lineage into two major sublineages (L1 and L2) using SNVs at sites 3037, 14408, and 23403. Subsequently, we categorized them into 130 sublineages (37 in S, 35 in L1, and 58 in L2) based on marker SNVs at 201 additional genomic sites. This lineage/sublineage designation system has a hierarchical structure and reflects the relatedness among the subclades of the major lineages. We also provide a companion website (www.covid19evolution.net) that allows users to visualize sublineage information and upload their own SARS-CoV-2 genomes for sublineage classification. Finally, we discussed the possible roles of compensatory mutations and natural selection during SARS-CoV-2's evolution. These efforts will improve our understanding of the temporal and spatial dynamics of SARS-CoV-2's genome evolution.

Keywords: Adaptive evolution; COVID-19; Compensatory advantageous mutation; Evolutionary analysis; Lineage designation.