Bayesian inference of phylogenetic networks from bi-allelic genetic markers

PLoS Comput Biol. 2018 Jan 10;14(1):e1005932. doi: 10.1371/journal.pcbi.1005932. eCollection 2018 Jan.

Abstract

Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Alleles
  • Bayes Theorem
  • Computational Biology
  • Computer Simulation
  • Genes, Plant*
  • Genetic Markers*
  • Likelihood Functions
  • Models, Genetic
  • New Zealand
  • Nucleic Acid Hybridization
  • Phylogeny*
  • Plantaginaceae / genetics*
  • Plantaginaceae / physiology
  • Polymorphism, Single Nucleotide
  • Probability
  • Recombination, Genetic
  • Software

Substances

  • Genetic Markers

Grants and funding

This work was supported by DBI-1355998 and CCF-1302179 to LN from the National Science Foundation (http://www.nsf.gov). This work was supported in part by the Data Analysis and Visualization Cyberinfrastructure funded by NSF under grant OCI-0959097 and Rice University, and by the Big-Data Private-Cloud Research Cyberinfrastructure MRI-award funded by NSF under grant CNS-1338099 and Rice University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.