Model-Based Detection of Whole-Genome Duplications in a Phylogeny

Mol Biol Evol. 2020 Sep 1;37(9):2734-2746. doi: 10.1093/molbev/msaa111.

Abstract

Ancient whole-genome duplications (WGDs) leave signatures in comparative genomic data sets that can be harnessed to detect these events of presumed evolutionary importance. Current statistical approaches for the detection of ancient WGDs in a phylogenetic context have two main drawbacks. The first is that unwarranted restrictive assumptions on the "background" gene duplication and loss rates make inferences unreliable in the face of model violations. The second is that most methods can only be used to examine a limited set of a priori selected WGD hypotheses and cannot be used to discover WGDs in a phylogeny. In this study, we develop an approach for WGD inference using gene count data that seeks to overcome both issues. We employ a phylogenetic birth-death model that includes WGD in a flexible hierarchical Bayesian approach and use reversible-jump Markov chain Monte Carlo to perform Bayesian inference of branch-specific duplication, loss, and WGD retention rates across the space of WGD configurations. We evaluate the proposed method using simulations, apply it to data sets from flowering plants, and discuss the statistical intricacies of model-based WGD inference.

Keywords: Bayesian inference; gene content evolution; genome duplication; phylogenetics; reversible-jump MCMC.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Chromosome Duplication
  • Computer Simulation
  • Gene Duplication*
  • Genetic Techniques*
  • Genome*
  • Magnoliopsida / genetics
  • Markov Chains
  • Models, Genetic*
  • Monte Carlo Method
  • Phylogeny*
  • Polyploidy