Systematic investigations of gene effects on both topologies and supports: An Echinococcus illustration

J Bioinform Comput Biol. 2017 Oct;15(5):1750019. doi: 10.1142/S0219720017500196. Epub 2017 Aug 16.

Abstract

In this paper, we propose a high performance computing toolbox implementing efficient statistical methods for the study of phylogenies. This toolbox, which implements logit models and LASSO-type penalties, gives a way to better understand, measure, and compare the impact of each gene on a global phylogeny. As an application, we study the Echinococcus phylogeny, which is often considered as a particularly difficult example. Mitochondrial and nuclear genomes (19 coding sequences) of nine Echinococcus species are considered in order to investigate the molecular phylogeny of this genus. First, we check that the 19 gene trees lead to 19 totally different unsupported topologies (a topology is the sister relationship when both branch lengths and supports are ignored in a phylogenetic tree), while using the 19 genes as a whole are not sufficient for estimating the phylogeny. In order to circumvent this issue and understand the impact of the genes, we computed 43,796 trees using combinations ranging from 13 to 19 genes. By doing so, 15 topologies are obtained. Four particular topologies, appearing more robust and frequent, are then selected for more precise investigation. Refining further our statistical analysis, a particularly robust topology is extracted. We also carefully demonstrate the influence of nuclear genes on the likelihood of the phylogeny.

Keywords: Echinococcus; Molecular phylogeny; statistical tests.

MeSH terms

  • Animals
  • Cell Nucleus / genetics
  • Computational Biology / methods*
  • DNA, Ribosomal / genetics
  • Echinococcus / genetics
  • Echinococcus / physiology*
  • Gene Frequency
  • Genes
  • Genome, Mitochondrial / genetics
  • Models, Theoretical
  • Phylogeny*

Substances

  • DNA, Ribosomal