Comparative analysis of human coronaviruses focusing on nucleotide variability and synonymous codon usage patterns

Genomics. 2021 Jul;113(4):2177-2188. doi: 10.1016/j.ygeno.2021.05.008. Epub 2021 May 19.

Abstract

The prevailing COVID-19 pandemic has drawn the attention of the scientific community to study the evolutionary origin of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2). This study is a comprehensive quantitative analysis of the protein-coding sequences of seven human coronaviruses (HCoVs) to decipher the nucleotide sequence variability and codon usage patterns. It is essential to understand the survival ability of the viruses, their adaptation to hosts, and their evolution. The current analysis revealed a high abundance of the relative dinucleotide (odds ratio), GC and CT pairs in the first and last two codon positions, respectively, as well as a low abundance of the CG pair in the last two positions of the codon, which might be related to the evolution of the viruses. A remarkable level of variability of GC content in the third position of the codon among the seven coronaviruses was observed. Codons with high RSCU values are primarily from the aliphatic and hydroxyl amino acid groups, and codons with low RSCU values belong to the aliphatic, cyclic, positively charged, and sulfur-containing amino acid groups. In order to elucidate the evolutionary processes of the seven coronaviruses, a phylogenetic tree (dendrogram) was constructed based on the RSCU scores of the codons. The severe and mild categories CoVs were positioned in different clades. A comparative phylogenetic study with other coronaviruses depicted that SARS-CoV-2 is close to the CoV isolated from pangolins (Manis javanica, Pangolin-CoV) and cats (Felis catus, SARS(r)-CoV). Further analysis of the effective number of codon (ENC) usage bias showed a relatively higher bias for SARS-CoV and MERS-CoV compared to SARS-CoV-2. The ENC plot against GC3 suggested that the mutational bias might have a role in determining the codon usage variation among candidate viruses. A codon adaptability study on a few human host parasites (from different kingdoms), including CoVs, showed a diverse adaptability pattern. SARS-CoV-2 and SARS-CoV exhibit relatively lower but similar codon adaptability compared to MERS-CoV.

Keywords: Amino acid; Codon; Coronaviruses; Nucleotide; Phylogeny; RSCU.

MeSH terms

  • Base Composition / genetics
  • COVID-19 / genetics*
  • COVID-19 / virology
  • Codon / genetics
  • Codon Usage / genetics*
  • Computational Biology
  • Evolution, Molecular*
  • Genome, Viral / genetics
  • Humans
  • Nucleotides / genetics
  • Pandemics
  • SARS-CoV-2 / genetics*
  • SARS-CoV-2 / pathogenicity

Substances

  • Codon
  • Nucleotides