The effects of random taxa sampling schemes in Bayesian virus phylogeography

Infect Genet Evol. 2018 Oct:64:225-230. doi: 10.1016/j.meegid.2018.07.003. Epub 2018 Jul 4.

Abstract

Public health researchers are often tasked with accurately and quickly identifying the location and time when an epidemic originated from a representative sample of nucleotide sequences. In this paper, we investigate multiple approaches to subsampling the sequence set when employing a Bayesian phylogeographic generalized linear model. Our results indicate that near-categorical posterior MCC estimates on the root can be obtained with replicate runs using 25-50% of the sequence data, and that including 90% of sequences does not necessarily entail more accurate inferences. We present the first analysis of predictor signal suppression and show how the ability to detect the influence of predictor variables is limited when sample size predictors are included in the models.

Keywords: Phylogeography; Selection Bias; Viruses.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem*
  • Databases, Genetic
  • Epidemics
  • Humans
  • Phylogeny*
  • Phylogeography*
  • United States / epidemiology
  • Virus Diseases / epidemiology
  • Virus Diseases / virology
  • Viruses / classification*
  • Viruses / genetics*