Annotation inconsistencies beyond sequence similarity-based function prediction - phylogeny and genome structure

Stand Genomic Sci. 2015 Nov 19:10:108. doi: 10.1186/s40793-015-0101-2. eCollection 2015.

Abstract

The function annotation process in computational biology has increasingly shifted from the traditional characterization of individual biochemical roles of protein molecules to the system-wide detection of entire metabolic pathways and genomic structures. The so-called genome-aware methods broaden misannotation inconsistencies in genome sequences beyond protein function assignments, encompassing phylogenetic anomalies and artifactual genomic regions. We outline three categories of error propagation in databases by providing striking examples - at various levels of appreciation by the community from traditional to emerging, thus raising awareness for future solutions.

Keywords: Error propagation; Genome evolution; Genome structure; Genome-aware methods; Genome-wide annotation; Mis-annotation modeling; Next-generation sequencing; Protein function prediction.