Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts

Mol Phylogenet Evol. 2019 Oct:139:106539. doi: 10.1016/j.ympev.2019.106539. Epub 2019 Jun 18.

Abstract

Genomic datasets sometimes support conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (=concatenation) approach, several methods that measure the distribution of support and conflict among loci were introduced over 15 years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: (1) compare conflicting results favored by alternative coalescence methods, (2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, (3) assess the effects of missing data in species-tree analysis, and (4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the differential weight placed on gene trees with high versus low taxon sampling. The attribution of exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.

Keywords: ASTRAL; Coalescence; Gene tree; Incomplete lineage sorting; MP-EST; Species tree.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Bayes Theorem
  • Bias
  • Biological Evolution
  • Computer Simulation
  • Genes
  • Genomics
  • Lizards / classification
  • Lizards / genetics
  • Magnoliopsida / classification
  • Magnoliopsida / genetics
  • Phylogeny*
  • Plants / classification
  • Plants / genetics
  • Probability