Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies.
Availability and implementation: https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.