Scalable Reconstruction of SARS-CoV-2 Phylogeny with Recurrent Mutations

J Comput Biol. 2021 Nov;28(11):1130-1141. doi: 10.1089/cmb.2021.0306. Epub 2021 Oct 25.

Abstract

This article presents a novel scalable character-based phylogeny algorithm for dense viral sequencing data called SPHERE (Scalable PHylogEny with REcurrent mutations). The algorithm is based on an evolutionary model where recurrent mutations are allowed, but backward mutations are prohibited. The algorithm creates rooted character-based phylogeny trees, wherein all leaves and internal nodes are labeled by observed taxa. We show that SPHERE phylogeny is more stable than Nextstrain's, and that it accurately infers known transmission links from the early pandemic. SPHERE is a fast algorithm that can process >200,000 sequences in <2 hours, which offers a compact phylogenetic visualization of Global Initiative on Sharing All Influenza Data (GISAID).

Keywords: SARS-CoV-2 sequences; phylogenetic tree inference; recurrent mutations.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • COVID-19 / transmission
  • COVID-19 / virology
  • Databases, Genetic
  • Humans
  • Mutation*
  • Phylogeny*
  • SARS-CoV-2 / genetics*