d-PBWT: dynamic positional Burrows-Wheeler transform

Bioinformatics. 2021 Aug 25;37(16):2390-2397. doi: 10.1093/bioinformatics/btab117.

Abstract

Motivation: Durbin's positional Burrows-Wheeler transform (PBWT) is a scalable data structure for haplotype matching. It has been successfully applied to identical by descent (IBD) segment identification and genotype imputation. Once the PBWT of a haplotype panel is constructed, it supports efficient retrieval of all shared long segments among all individuals (long matches) and efficient query between an external haplotype and the panel. However, the standard PBWT is an array-based static data structure and does not support dynamic updates of the panel.

Results: Here, we generalize the static PBWT to a dynamic data structure, d-PBWT, where the reverse prefix sorting at each position is stored with linked lists. We also developed efficient algorithms for insertion and deletion of individual haplotypes. In addition, we verified that d-PBWT can support all algorithms of PBWT. In doing so, we systematically investigated variations of set maximal match and long match query algorithms: while they all have average case time complexity independent of database size, they have different worst case complexities and dependencies on additional data structures.

Availabilityand implementation: The benchmarking code is available at genome.ucf.edu/d-PBWT.

Supplementary information: Supplementary data are available at Bioinformatics online.