Treetrimmer: a method for phylogenetic dataset size reduction

BMC Res Notes. 2013 Apr 12:6:145. doi: 10.1186/1756-0500-6-145.

Abstract

Background: With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures.

Findings: Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis.

Conclusions: TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Automation
  • Computational Biology / methods*
  • Databases, Genetic
  • Electron Transport Complex IV / genetics
  • Information Storage and Retrieval
  • Phylogeny*
  • Probability
  • Rhodophyta / genetics
  • Sequence Analysis, DNA
  • Software

Substances

  • Electron Transport Complex IV