Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk

PLoS Comput Biol. 2021 Jan 6;17(1):e1008561. doi: 10.1371/journal.pcbi.1008561. eCollection 2021 Jan.

Abstract

Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak's spatial distribution. We therefore consider an alternative model, the spatial Λ-Fleming-Viot process (ΛFV), which has recently gained popularity in population genetics. Despite the ΛFV's robustness to sampling biases, we find that the different assumptions of the ΛFV and BMP models result in different applicabilities, with the ΛFV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Computational Biology
  • Disease Outbreaks / statistics & numerical data
  • Flavivirus / genetics
  • Flavivirus Infections / epidemiology
  • Flavivirus Infections / virology
  • Genetics, Population / methods*
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Phylogeography / methods*
  • Selection Bias*

Grants and funding

AK and YS were supported by the Cambridge Mathematics Placements (CMP, https://www.maths.cam.ac.uk/opportunities/careers-for-mathematicians/summer-research-mathematics/summer-research-mathematics-cmp-and-research-cms). GB was supported by the Interne Fondsen KU Leuven / Internal Funds KU Leuven (https://www.kuleuven.be/english/research/support/if) under grant agreement C14/18/094, and from the Research Foundation -- Flanders (`Fonds voor Wetenschappelijk Onderzoek -- Vlaanderen’, https://www.fwo.be/G0E1420N). SG was supported by the Agence Nationale pour la Recherche https://anr.fr/ through the grant GENOSPACE. AK, UP, YS, NG and NDM were supported by the European Molecular Biology Laboratory. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.