An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

Proc Data Compress Conf. 2016 Mar-Apr:2016:221-230. doi: 10.1109/DCC.2016.39. Epub 2016 Dec 19.

Abstract

This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the ISO/IEC SC29/WG11 technical committee (a.k.a. MPEG), which is investigating the possibility of starting a standardization activity for genomic information representation.