Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design

Sci Rep. 2020 Sep 24;10(1):15643. doi: 10.1038/s41598-020-72533-2.

Abstract

As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses, for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Betacoronavirus / genetics*
  • COVID-19
  • COVID-19 Vaccines
  • Coronavirus Infections / immunology
  • Coronavirus Infections / prevention & control*
  • Coronavirus Nucleocapsid Proteins
  • Genome, Viral / genetics
  • Humans
  • Nucleocapsid Proteins / genetics*
  • Nucleocapsid Proteins / immunology
  • Pandemics / prevention & control*
  • Phosphoproteins
  • Pneumonia, Viral / prevention & control*
  • SARS-CoV-2
  • Spike Glycoprotein, Coronavirus / genetics*
  • Spike Glycoprotein, Coronavirus / immunology
  • Vaccines, Inactivated / immunology
  • Viral Vaccines / immunology*
  • Whole Genome Sequencing

Substances

  • COVID-19 Vaccines
  • Coronavirus Nucleocapsid Proteins
  • Nucleocapsid Proteins
  • Phosphoproteins
  • Spike Glycoprotein, Coronavirus
  • Vaccines, Inactivated
  • Viral Vaccines
  • nucleocapsid phosphoprotein, SARS-CoV-2
  • spike protein, SARS-CoV-2