A fast and accurate method for SARS-CoV-2 genomic tracing

Brief Bioinform. 2023 Sep 22;24(6):bbad339. doi: 10.1093/bib/bbad339.

Abstract

To contain infectious diseases, it is crucial to determine the origin and transmission routes of the pathogen, as well as how the virus evolves. With the development of genome sequencing technology, genome epidemiology has emerged as a powerful approach for investigating the source and transmission of pathogens. In this study, we first presented the rationale for genomic tracing of SARS-CoV-2 and the challenges we currently face. Identifying the most genetically similar reference sequence to the query sequence is a critical step in genome tracing, typically achieved using either a phylogenetic tree or a sequence similarity search. However, these methods become inefficient or computationally prohibitive when dealing with tens of millions of sequences in the reference database, as we encountered during the COVID-19 pandemic. To address this challenge, we developed a novel genomic tracing algorithm capable of processing 6 million SARS-CoV-2 sequences in less than a minute. Instead of constructing a giant phylogenetic tree, we devised a weighted scoring system based on mutation characteristics to quantify sequences similarity. The developed method demonstrated superior performance compared to previous methods. Additionally, an online platform was developed to facilitate genomic tracing and visualization of the spatiotemporal distribution of sequences. The method will be a valuable addition to standard epidemiological investigations, enabling more efficient genomic tracing. Furthermore, the computational framework can be easily adapted to other pathogens, paving the way for routine genomic tracing of infectious diseases.

Keywords: SARS-CoV-2; genomic tracing; phylogeny.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • COVID-19* / genetics
  • Genome, Viral
  • Genomics / methods
  • Humans
  • Pandemics
  • Phylogeny
  • SARS-CoV-2* / genetics