Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data

Mark N Puttick; Joseph E O'Reilly; Alastair R Tanner; James F Fleming; James Clark; Lucy Holloway; Jesus Lozano-Fernandez; Luke A Parry; James E Tarver; Davide Pisani; Philip C J Donoghue

doi:10.1098/rspb.2016.2290

Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data

Proc Biol Sci. 2017 Jan 11;284(1846):20162290. doi: 10.1098/rspb.2016.2290.

Authors

Affiliations

¹ School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK.
² Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, UK.
³ School of Biological Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK.
⁴ School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK davide.pisani@bristol.ac.uk.
⁵ School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK phil.donoghue@bristol.ac.uk.

Abstract

Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods.

Keywords: Bayesian; cladistics; morphology; palaeontology; parsimony; phylogeny.

MeSH terms

Bayes Theorem
Likelihood Functions
Phenotype*
Phylogeny*
Uncertainty*

Associated data

figshare/10.6084/m9.figshare.c.3653186

Abstract

MeSH terms

Associated data

Grants and funding