PhyloFisher: A phylogenomic package for resolving eukaryotic relationships

PLoS Biol. 2021 Aug 6;19(8):e3001365. doi: 10.1371/journal.pbio.3001365. eCollection 2021 Aug.

Abstract

Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher (https://github.com/TheBrownLab/PhyloFisher), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic "single-copy orthogroup" datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Eukaryota / genetics*
  • Phylogeny*
  • Software*

Grants and funding

This project was supported primarily by the United States National Science Foundation (NSF) Division of Environmental Biology (DEB) grants 1456054 and 2100888 (http://www.nsf.gov), awarded to MWB. Support for TP’s postdoctoral stay in MWB’s laboratory was supported by the J.W. Fulbright Commission of Czech Republic awarded to TP. ME and MK labs are supported by the Czech Science Foundation (grants 18-18699S and 18-28103S, respectively) and the ‘Centre for Research of Pathogenicity and Virulence of Parasites’ (ERD funds, project no. CZ.02.1.01/0.0/0.0/16_019/0000759). ES was supported by International Mobilities of Researchers of the Biology Centre (CZ.02.2.69/0.0/0.0/16_027/0008357) and the MSCA-IF-CZ SMART (CZ.02.2.69/0.0/0.0/20_079/0017809). Research on phylogenomics in AR’s lab is supported by the National Science Foundation (DEB-1442113). LE is supported by a grant from the European Research Council (ERC Starting grant 803151). FB thanks Science for Life Laboratory for supporting the work of JFHS in his laboratory, and JFHS thanks the German Research Foundation (DFG; STR1349/2-1, project # 432453260) for support. MK thanks IT4Innovations National Super Computer Center, Technical University of Ostrava, Ostrava, Czech Republic (project #Open-20-18) for providing computational resources. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.