Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure

Bioinformatics. 2007 Jun 15;23(12):1468-75. doi: 10.1093/bioinformatics/btm133. Epub 2007 May 5.

Abstract

Motivation: Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other.

Results: We present a method for combining gene-finders called Genomix. Genomix selects the predicted exons that are best conserved within and/or between species in terms of sequence and intron-exon structure, and combines them into a gene structure. Genomix was used to combine predictions from four gene-finders for Caenorhabditis elegans, by selecting the predicted exons that are best conserved with C.briggsae and C.remanei. On a set of approximately 1500 confirmed C.elegans genes, Genomix increased the exon-level specificity by 10.1% and sensitivity by 2.7% compared to the best input gene-finder.

Availability: Scripts and Supplementary Material can be found at http://www.sanger.ac.uk/Software/analysis/genomix

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Caenorhabditis / genetics
  • Computational Biology / methods*
  • Conserved Sequence*
  • DNA / genetics
  • DNA, Intergenic / genetics
  • Exons*
  • Genome, Helminth
  • Introns*
  • Models, Genetic
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Software*

Substances

  • DNA, Intergenic
  • DNA