GIIRA--RNA-Seq driven gene finding incorporating ambiguous reads

Bioinformatics. 2014 Mar 1;30(5):606-13. doi: 10.1093/bioinformatics/btt577. Epub 2013 Oct 11.

Abstract

Motivation: The reliable identification of genes is a major challenge in genome research, as further analysis depends on the correctness of this initial step. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available for gene finding. However, practical application in automated gene identification is still not the standard case. A particular challenge in including RNA-Seq data is the difficult handling of ambiguously mapped reads.

Results: We present GIIRA (Gene Identification Incorporating RNA-Seq data and Ambiguous reads), a novel prokaryotic and eukaryotic gene finder that is exclusively based on a RNA-Seq mapping and inherently includes ambiguously mapped reads. GIIRA extracts candidate regions supported by a sufficient number of mappings and reassigns ambiguous reads to their most likely origin using a maximum-flow approach. This avoids the exclusion of genes that are predominantly supported by ambiguous mappings. Evaluation on simulated and real data and comparison with existing methods incorporating RNA-Seq information highlight the accuracy of GIIRA in identifying the expressed genes.

Availability and implementation: GIIRA is implemented in Java and is available from https://sourceforge.net/projects/giira/.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • Animals
  • Escherichia coli / genetics
  • Gene Expression Profiling / methods*
  • Genes*
  • Genomics
  • Humans
  • Saccharomyces cerevisiae / genetics
  • Sequence Alignment
  • Sequence Analysis, RNA / methods*