CURC: a CUDA-based reference-free read compressor

Shaohui Xie; Xiaotian He; Shan He; Zexuan Zhu

doi:10.1093/bioinformatics/btac333

CURC: a CUDA-based reference-free read compressor

Bioinformatics. 2022 Jun 13;38(12):3294-3296. doi: 10.1093/bioinformatics/btac333.

Authors

Shaohui Xie¹, Xiaotian He¹, Shan He², Zexuan Zhu^{1

3}

Affiliations

¹ College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China.
² School of Computer Science, University of Birmingham, Birmingham B15 2TT, UK.
³ BGI-Shenzhen, Shenzhen 518083, China.

PMID: 35579371
DOI: 10.1093/bioinformatics/btac333

Abstract

Motivation: The data deluge of high-throughput sequencing (HTS) has posed great challenges to data storage and transfer. Many specific compression tools have been developed to solve this problem. However, most of the existing compressors are based on central processing unit (CPU) platform, which might be inefficient and expensive to handle large-scale HTS data. With the popularization of graphics processing units (GPUs), GPU-compatible sequencing data compressors become desirable to exploit the computing power of GPUs.

Results: We present a GPU-accelerated reference-free read compressor, namely CURC, for FASTQ files. Under a GPU-CPU heterogeneous parallel scheme, CURC implements highly efficient lossless compression of DNA stream based on the pseudogenome approach and CUDA library. CURC achieves 2-6-fold speedup of the compression with competitive compression rate, compared with other state-of-the-art reference-free read compressors.

Availability and implementation: CURC can be downloaded from https://github.com/BioinfoSZU/CURC.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Data Compression*
Gene Library
High-Throughput Nucleotide Sequencing
Sequence Analysis, DNA

Abstract

Publication types

MeSH terms

Grants and funding