Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array

J Comput Biol. 2019 Sep;26(9):948-961. doi: 10.1089/cmb.2018.0230. Epub 2019 May 29.

Abstract

Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array.

Keywords: Burrows–Wheeler transform; longest common prefix array; multithreading; parallel algorithms.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Sequence Analysis / methods*