A graph based algorithm for generating EST consensus sequences

Bioinformatics. 2005 Apr 15;21(8):1371-5. doi: 10.1093/bioinformatics/bti184. Epub 2004 Nov 30.

Abstract

Motivation: EST sequences constitute an abundant, yet error prone resource for computational biology. Expressed sequences are important in gene discovery and identification, and they are also crucial for the discovery and classification of alternative splicing. An important challenge when processing EST sequences is the reconstruction of mRNA by assembling EST clusters into consensus sequences.

Results: In contrast to the more established assembly tools, we propose an algorithm that constructs a graph over sequence fragments of fixed size, and produces consensus sequences as traversals of this graph. We provide a tool implementing this algorithm, and perform an experiment where the consensus sequences produced by our implementation, as well as by currently available tools, are compared to mRNA. The results show that our proposed algorithm in a majority of the cases produces consensus of higher quality than the established sequence assemblers and at a competitive speed.

Availability: The source code for the implementation is available under a GPL license from http://www.ii.uib.no/~ketil/bioinformatics/

Contact: ketil@ii.uib.no.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Graphics
  • Consensus Sequence / genetics*
  • Conserved Sequence / genetics
  • Contig Mapping / methods*
  • Expressed Sequence Tags*
  • Numerical Analysis, Computer-Assisted
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • User-Computer Interface