Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia)

Mol Phylogenet Evol. 2019 May:134:12-23. doi: 10.1016/j.ympev.2019.01.012. Epub 2019 Jan 22.

Abstract

Across the tree of life, phylogenetic analysis is increasingly being performed using transcriptome data. As a result of heterogeneous gene expression within individual organisms and unequal sequencing depth between samples, coverage of homologous loci in such datasets is typically inhomogeneous. Consequently, missing data are a common feature of phylotranscriptomic inference, but their impact on phylogenetic analysis remains poorly characterised empirically. Considering the complexity of the evolutionary history of stony corals (Cnidaria: Anthozoa: Scleractinia), transcriptome data hold great promise for resolving their phylogeny, particularly if there is a good understanding of missing data and data type (either amino acid or DNA) effects. Here, we reconstructed a broad phylogenetic tree of 39 scleractinian species with 3 corallimorpharians as outgroups, including 15 transcriptomes that were newly sequenced and assembled in this study. Between 63 and 505 loci were used to analyse the scleractinian phylogeny, and we quantified differences in tree topology, tree shape, bootstrap support and effects of conflicting gene trees among datasets of varying completeness for both amino acid and DNA sequences. Even with almost 70% missing data, tree topologies appear to be mostly unaffected, although there are higher incongruence levels in the less complete datasets. Furthermore, DNA trees outperform amino acid trees in bootstrap support and robustness against incongruent loci. Overall, our findings indicate that high levels of missing data can still produce expected tree topologies, but identifying and omitting incongruent loci can lead to more consistent branch length estimates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Anthozoa / classification*
  • Anthozoa / genetics*
  • Base Sequence
  • Gene Expression Profiling*
  • Genetic Loci
  • Genomics
  • Likelihood Functions
  • Phylogeny*
  • Transcriptome / genetics