Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling

J Comput Biol. 2022 Jan;29(1):74-89. doi: 10.1089/cmb.2021.0383. Epub 2022 Jan 5.

Abstract

Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.

Keywords: deep neural networks; phylogeny estimation and heterotachy.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Classification / methods
  • Computational Biology
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data
  • Deep Learning*
  • Evolution, Molecular
  • Neural Networks, Computer*
  • Phylogeny*