A comparative guide to gene prediction tools for the bioinformatics amateur

Int J Oncol. 2002 Apr;20(4):697-705. doi: 10.3892/ijo.20.4.697.

Abstract

Several hundred programs using different algorithms have been designed to predict individual coding features within any genomic sequence, but none of these tools covers all aspects of a gene or is 100% accurate in its prediction. Automated simultaneous processing of the results from a number of these programs minimizes the chance of a false positive prediction and quickly generates integrated data. We report here on the analysis of two known genes in 5 and 25 kb segments of genomic sequence using four genome annotation packages, NIX, RUMMAGE, Genotator and EMBOSS. Gene predictions were confirmed using cDNA sequences and a comparison was made between the packages. This study showed a similarity in the ability of NIX, RUMMAGE and Genotator to predict well-characterised genes and basic structures, but poor exon prediction for a small, 3 exon gene. However, the BLAST subprograms of all three packages correctly identified the 3 exons. In addition, EST BLAST subprograms identified a previously undescribed, possible 5' untranslated exon for the smaller gene and a number of putative alternatively spliced exons in the larger gene. Overall, NIX was found to be the most user-friendly package, in terms of easy access to databases and the interactive graphical display of results.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromosomes, Human, Pair 17 / genetics
  • Computational Biology / methods*
  • DNA / genetics*
  • DNA, Complementary / genetics
  • Databases, Factual
  • Expressed Sequence Tags
  • Gene Expression / genetics*
  • Humans
  • Nuclear Proteins / genetics*
  • RNA Splicing
  • Ribonucleoproteins / genetics
  • Serine-Arginine Splicing Factors
  • Sialyltransferases / genetics*
  • Software
  • Spliceosomes

Substances

  • DNA, Complementary
  • Nuclear Proteins
  • Ribonucleoproteins
  • SRSF2 protein, human
  • Serine-Arginine Splicing Factors
  • DNA
  • Sialyltransferases
  • galactosyl-1-3-N-acetylgalactosaminyl-specific 2,6-sialyltransferase