RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures

Bioinformatics. 2023 Nov 1;39(11):btad695. doi: 10.1093/bioinformatics/btad695.

Abstract

Summary: We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.7× to 19.8× over Kssd for the time-consuming sketch generation and distance computation on commonly used workstations. In addition, it significantly outperforms Mash, BinDash, and Dashing2. Moreover, RabbitKSSD can efficiently perform all-vs-all distance computation for all RefSeq complete bacterial genomes (455 GB in FASTA format) in just 2 min on a 64-core workstation.

Availability and implementation: RabbitKSSD is available at https://github.com/RabbitBio/RabbitKSSD.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Evolution
  • Genome, Bacterial*
  • Software*