GHOST: Recovering Historical Signal from Heterotachously Evolved Sequence Alignments

Syst Biol. 2020 Mar 1;69(2):249-264. doi: 10.1093/sysbio/syz051.

Abstract

Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.

Keywords: Convergent evolution; heterotachy; maximum likelihood; mixture model; phylogenetics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Classification / methods*
  • Evolution, Molecular
  • Fishes / classification
  • Fishes / genetics
  • Models, Genetic
  • Phylogeny
  • Sequence Alignment / methods*
  • Software*

Associated data

  • Dryad/10.5061/dryad.t389h81