LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data

Bioinform Adv. 2021 Sep 25;1(1):vbab022. doi: 10.1093/bioadv/vbab022. eCollection 2021.

Abstract

Motivation: Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist.

Results: We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances.

Availability and implementation: LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.