Using a priori knowledge to align sequencing reads to their exact genomic position

Nucleic Acids Res. 2012 Sep;40(16):e125. doi: 10.1093/nar/gks393. Epub 2012 May 11.

Abstract

The use of a priori knowledge in the alignment of targeted sequencing data is investigated using computational experiments. Adapting a Needleman-Wunsch algorithm to incorporate the genomic position information from the targeted capture, we demonstrate that alignment can be done to just the target region of interest. When in addition use is made of direct string comparison, an improvement of up to a factor of 8 in alignment speed compared to the fastest conventional aligner (Bowtie) is obtained. This results in a total alignment time in targeted sequencing of around 7 min for aligning approximately 56 million captured reads. For conventional aligners such as Bowtie, BWA or MAQ, alignment to just the target region is not feasible as experiments show that this leads to an additional 88% SNP calls, the vast majority of which are false positives (≈ 92%).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Genomics / methods*
  • Polymorphism, Single Nucleotide
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA*