The status and analysis of common mutations found in the SARS-CoV-2 whole genome sequences from Bangladesh

Gene Rep. 2022 Jun:27:101608. doi: 10.1016/j.genrep.2022.101608. Epub 2022 Apr 4.

Abstract

Rapid emergence of covid-19 variants by continuous mutation made the world experience continuous waves of infections and as a result, a huge number of death-toll recorded so far. It is, therefore, very important to investigate the diversity and nature of the mutations in the SARS-CoV-2 genomes. In this study, the common mutations occurred in the whole genome sequences of SARS-CoV-2 variants of Bangladesh in a certain timeline were analyzed to better understand its status. Hence, a total of 78 complete genome sequences available in the NCBI database were obtained, aligned and further analyzed. Scattered Single Nucleotide Polymorphisms (SNPs) were identified throughout the genome of variants and common SNPs such as: 241:C>T in the 5'UTR of Open Reading Frame 1A (ORF1A), 3037: C>T in Non-structural Protein 3 (NSP3), 14,408: C>T in ORF6 and 23,402: A>G, 23,403: A>G in Spike Protein (S) were observed, but all of them were synonymous mutations. About 97% of the studied genomes showed a block of tri-nucleotide alteration (GGG>AAC), the most common non-synonymous mutation in the 28,881-28,883 location of the genome. This block results in two amino acid changes (203-204: RG>KR) in the SR rich motif of the nucleocapsid (N) protein of SARS-CoV-2, introducing a lysine in between serine and arginine. The N protein structure of the mutant was predicted through protein modeling. However, no observable difference was found between the mutant and the reference (Wuhan) protein. Further, the protein stability changes upon mutations were analyzed using the I-Mutant2.0 tool. The alteration of the arginine to lysine at the amino acid position 203, showed reduction of entropy, suggesting a possible impact on the overall stability of the N protein. The estimation of the non-synonymous to synonymous substitution ratio (dN/dS) were analyzed for the common mutations and the results showed that the overall mean distance among the N-protein variants were statistically significant, supporting the non-synonymous nature of the mutations. The phylogenetic analysis of the selected 78 genomes, compared with the most common genomic variants of this virus across the globe showed a distinct cluster for the analyzed Bangladeshi sequences. Further studies are warranted for conferring any plausible association of these mutations with the clinical manifestation.

Keywords: +ssRNA, positive single-stranded RNA; ACE2, Angiotensin-Converting Enzyme 2; Block mutation; CDK, Cyclin Dependent Kinases; COX2, Cyclooxygenase 2; CTD, C-terminal Domain; CoVs, Coronaviruses; Common mutations; DGHS, General of Health Services; ECM, Extracellular Matrix Protein; ERGIC, ER-Golgi intermediate compartment; GSK3, Glycogen Synthase Kinase 3; IRF3, Interferon Regulatory Factor 3; NFkB, Nuclear Factor kappa B; NSP, Nonstructural Protein; NTD, N-terminal Domain; ORFs, Open Reading Frames; PLP, Papain-like Protease; RBD, Receptor-Binding Domain; RTC, Replication–Transcription Complex; RdRp, RNA-dependent RNA polymerase; SARS-CoV-2; SNP, Single Nucleotide Polymorphism; SR rich motif; TMPRSS2, Transmembrane Protease Serine 2; sgRNAs, Sub-genomic RNAs.