WISCOD: a statistical web-enabled tool for the identification of significant protein coding regions

Biomed Res Int. 2014:2014:282343. doi: 10.1155/2014/282343. Epub 2014 Sep 15.

Abstract

Classically, gene prediction programs are based on detecting signals such as boundary sites (splice sites, starts, and stops) and coding regions in the DNA sequence in order to build potential exons and join them into a gene structure. Although nowadays it is possible to improve their performance with additional information from related species or/and cDNA databases, further improvement at any step could help to obtain better predictions. Here, we present WISCOD, a web-enabled tool for the identification of significant protein coding regions, a novel software tool that tackles the exon prediction problem in eukaryotic genomes. WISCOD has the capacity to detect real exons from large lists of potential exons, and it provides an easy way to use global P value called expected probability of being a false exon (EPFE) that is useful for ranking potential exons in a probabilistic framework, without additional computational costs. The advantage of our approach is that it significantly increases the specificity and sensitivity (both between 80% and 90%) in comparison to other ab initio methods (where they are in the range of 70-75%). WISCOD is written in JAVA and R and is available to download and to run in a local mode on Linux and Windows platforms.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chromosomes, Human, Pair 9 / genetics
  • Computer Simulation
  • Databases, Protein
  • Drosophila melanogaster / metabolism
  • Exons / genetics
  • Humans
  • Internet*
  • Open Reading Frames / genetics*
  • PAX5 Transcription Factor / genetics
  • Probability
  • ROC Curve
  • Software*
  • Statistics as Topic*

Substances

  • PAX5 Transcription Factor
  • PAX5 protein, human