Scalable Pairwise Whole-Genome Homology Mapping of Long Genomes with BubbZ

iScience. 2020 Jun 26;23(6):101224. doi: 10.1016/j.isci.2020.101224. Epub 2020 Jun 3.

Abstract

Pairwise whole-genome homology mapping is the problem of finding all pairs of homologous intervals between a pair of genomes. As the number of available whole genomes has been rising dramatically in the last few years, there has been a need for more scalable homology mappers. In this paper, we develop an algorithm (BubbZ) for computing whole-genome pairwise homology mappings, especially in the context of all-to-all comparison for multiple genomes. BubbZ is based on an algorithm for computing chains in compacted de Bruijn graphs. We evaluate BubbZ on simulated datasets, a dataset composed of 16 long mouse genomes, and a large dataset of 1,600 Salmonella genomes. We show up to approximately an order of magnitude speed improvement, compared with MashMap2 and Minimap2, while retaining similar accuracy.

Keywords: Algorithms; Bioinformatics; Genomics.