Going back to the roots: Evaluating Bayesian phylogeographic models with discrete trait uncertainty

Infect Genet Evol. 2020 Nov:85:104501. doi: 10.1016/j.meegid.2020.104501. Epub 2020 Aug 13.

Abstract

Phylogeography is a popular way to analyze virus sequences annotated with discrete, epidemiologically-relevant, trait data. For applied public health surveillance, a key quantity of interest is often the state at the root of the inferred phylogeny. In epidemiological terms, this represents the geographic origin of the observed outbreak. Since determining the origin of an outbreak is often critical for public health intervention, it is prudent to understand how well phylogeographic models perform this root state classification task under various analytical scenarios. Specifically, we investigate how discrete state space and sequence data set influence the root state classification accuracy. We performed phylogeographic inference on several simulated DNA data sets while i) increasing the number of sequences and ii) increasing the total number of possible discrete trait values. We show that phylogeographic models tend to perform best at intermediate sequence data set sizes. Further, we demonstrate that a popular metric used for evaluation of phylogeographic models, the Kullback-Leibler (KL) divergence, both increases with discrete state space and data set sizes. Further, by modeling phylogeographic root state classification accuracy using logistic regression, we show that KL is not supported as a predictor of model accuracy, indicating its limited utility for assessing phylogeographic model performance on empirical data. These results suggest that relying solely on the KL metric may lead to artificially inflated support for models with finer discretization schemes and larger data set sizes. These results will be important for public health practitioners seeking to use phylogeographic models for applied infectious disease surveillance.

Keywords: Bayesian statistics; Model evaluation; Phylogenetics; Phylogeography.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Disease Outbreaks / statistics & numerical data*
  • Genetic Variation*
  • Guidelines as Topic
  • Humans
  • Models, Genetic
  • Models, Theoretical
  • Phenotype
  • Phylogeny*
  • Phylogeography / methods*
  • Research Design / standards*
  • Virus Diseases / epidemiology*
  • Virus Diseases / genetics*