LFastqC: A lossless non-reference-based FASTQ compressor

PLoS One. 2019 Nov 14;14(11):e0224806. doi: 10.1371/journal.pone.0224806. eCollection 2019.

Abstract

The cost-effectiveness of next-generation sequencing (NGS) has led to the advancement of genomic research, thereby regularly generating a large amount of raw data that often requires efficient infrastructures such as data centers to manage the storage and transmission of such data. The generated NGS data are highly redundant and need to be efficiently compressed to reduce the cost of storage space and transmission bandwidth. We present a lossless, non-reference-based FASTQ compression algorithm, known as LFastqC, an improvement over the LFQC tool, to address these issues. LFastqC is compared with several state-of-the-art compressors, and the results indicate that LFastqC achieves better compression ratios for important datasets such as the LS454, PacBio, and MinION. Moreover, LFastqC has a better compression and decompression speed than LFQC, which was previously the top-performing compression algorithm for the LS454 dataset. LFastqC is freely available at https://github.uconn.edu/sya12005/LFastqC.

MeSH terms

  • Algorithms
  • Data Compression*
  • Databases as Topic
  • High-Throughput Nucleotide Sequencing*
  • Software*

Grants and funding

The authors received no specific funding for this work.