GToTree: a user-friendly workflow for phylogenomics

Bioinformatics. 2019 Oct 15;35(20):4162-4164. doi: 10.1093/bioinformatics/btz188.

Abstract

Summary: Genome-level evolutionary inference (i.e. phylogenomics) is becoming an increasingly essential step in many biologists' work. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required-such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together etc.-can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on a specified single-copy gene (SCG) set. Although GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ∼12 000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees.

Availability and implementation: GToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTree. It is implemented primarily in bash with helper scripts written in python.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Nucleic Acid
  • Genomics
  • Phylogeny*
  • Software*
  • Workflow*