High-Integrity Sequencing of Spike Gene for SARS-CoV-2 Variant Determination

Int J Mol Sci. 2022 Mar 17;23(6):3257. doi: 10.3390/ijms23063257.

Abstract

For tiling of the SARS-CoV-2 genome, the ARTIC Network provided a V4 protocol using 99 pairs of primers for amplicon production and is currently the widely used amplicon-based approach. However, this technique has regions of low sequence coverage and is labour-, time-, and cost-intensive. Moreover, it requires 14 pairs of primers in two separate PCRs to obtain spike gene sequences. To overcome these disadvantages, we proposed a single PCR to efficiently detect spike gene mutations. We proposed a bioinformatic protocol that can process FASTQ reads into spike gene consensus sequences to accurately call spike protein variants from sequenced samples or to fairly express the cases of missing amplicons. We evaluated the in silico detection rate of primer sets that yield amplicon sizes of 400, 1200, and 2500 bp for spike gene sequencing of SARS-CoV-2 to be 59.49, 76.19, and 92.20%, respectively. The in silico detection rate of our proposed single PCR primers was 97.07%. We demonstrated the robustness of our analytical protocol against 3000 Oxford Nanopore sequencing runs of distinct datasets, thus ensuring high-integrity sequencing of spike genes for variant SARS-CoV-2 determination. Our protocol works well with the data yielded from versatile primer designs, making it easy to determine spike protein variants.

Keywords: SARS-CoV-2; nanopore sequencing; spike gene; variant.

MeSH terms

  • COVID-19 / virology*
  • Computational Biology
  • Genome, Viral
  • Genomics / methods
  • Humans
  • Mutation
  • Mutation Rate
  • Phylogeny
  • SARS-CoV-2 / classification
  • SARS-CoV-2 / genetics*
  • Sequence Analysis, DNA
  • Spike Glycoprotein, Coronavirus / genetics*

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2

Supplementary concepts

  • SARS-CoV-2 variants