Bayesian inference of ancestral recombination graphs

PLoS Comput Biol. 2022 Mar 9;18(3):e1009960. doi: 10.1371/journal.pcbi.1009960. eCollection 2022 Mar.

Abstract

We present a novel algorithm, implemented in the software ARGinfer, for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. Our Markov Chain Monte Carlo algorithm takes advantage of the Succinct Tree Sequence data structure that has allowed great advances in simulation and point estimation, but not yet probabilistic inference. Unlike previous methods, which employ the Sequentially Markov Coalescent approximation, ARGinfer uses the Coalescent with Recombination, allowing more accurate inference of key evolutionary parameters. We show using simulations that ARGinfer can accurately estimate many properties of the evolutionary history of the sample, including the topology and branch lengths of the genealogical tree at each sequence site, and the times and locations of mutation and recombination events. ARGinfer approximates posterior probability distributions for these and other quantities, providing interpretable assessments of uncertainty that we show to be well calibrated. ARGinfer is currently limited to tens of DNA sequences of several hundreds of kilobases, but has scope for further computational improvements to increase its applicability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Markov Chains
  • Models, Genetic*
  • Phylogeny
  • Recombination, Genetic / genetics
  • Software*

Grants and funding

A. M. was funded by the Melbourne Research Scholarship, the Xing Lei Scholarship, the Professor Maurice H. Belz Fund, and the Albert Shimmins Fund. J. Koskela was supported by the UK Engineering and Physical Sciences Research Council grant EP/R044732/1. J. Kelleher was supported by the Robertson Foundation. D.B., Y.C. and J. Kelleher were supported by the Australian Research Council grant DP210102168. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.