VCFShark: how to squeeze a VCF file

Bioinformatics. 2021 Oct 11;37(19):3358-3360. doi: 10.1093/bioinformatics/btab211.

Abstract

Summary: Variant Call Format (VCF) files with results of sequencing projects take a lot of space. We propose the VCFShark, which is able to compress VCF files up to an order of magnitude better than the de facto standards (gzipped VCF and BCF). The advantage over competitors is the greatest when compressing VCF files containing large amounts of genotype data. The processing speeds up to 100 MB/s and main memory requirements lower than 30 GB allow to use our tool at typical workstations even for large datasets.

Availability and implementation: https://github.com/refresh-bio/vcfshark.

Supplementary information: Supplementary data are available at Bioinformatics online.