The OGCleaner: filtering false-positive homology clusters

Bioinformatics. 2017 Jan 1;33(1):125-127. doi: 10.1093/bioinformatics/btw571. Epub 2016 Sep 10.

Abstract

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies.

Availability and implementation: https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • Molecular Sequence Annotation
  • Phylogeny
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid*

Substances

  • Proteins