ASPic-GeneID: a lightweight pipeline for gene prediction and alternative isoforms detection

Biomed Res Int. 2013:2013:502827. doi: 10.1155/2013/502827. Epub 2013 Nov 7.

Abstract

New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Chromosome Mapping
  • Computational Biology / methods*
  • Exons
  • Expressed Sequence Tags
  • Genome
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Molecular Sequence Annotation*
  • Sequence Analysis, DNA / methods*
  • Software*