A fast hybrid short read fragment assembly algorithm

Bioinformatics. 2009 Sep 1;25(17):2279-80. doi: 10.1093/bioinformatics/btp374. Epub 2009 Jun 17.

Abstract

The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.

MeSH terms

  • Algorithms*
  • Computational Biology
  • Databases, Nucleic Acid
  • Helicobacter pylori / genetics*
  • Sequence Analysis, DNA / methods*
  • Staphylococcus aureus / genetics*