PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead

Lingqi Zhang; Cheng Liu; Shoubin Dong

doi:10.3390/genes10110886

PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead

Genes (Basel). 2019 Nov 4;10(11):886. doi: 10.3390/genes10110886.

Authors

Lingqi Zhang¹, Cheng Liu², Shoubin Dong³

Affiliations

¹ Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road 381, Guangzhou 51000, China. cslingqizhang@gmail.com.
² Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road 381, Guangzhou 51000, China. ztcattlepatato@gmail.com.
³ Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road 381, Guangzhou 51000, China. sbdong@scut.edu.cn.

Abstract

(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead.

Keywords: BWA-MEM; Spark; low overhead.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Big Data
Chromosome Mapping
Genome, Human
High-Throughput Nucleotide Sequencing / methods*
Humans
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Software