RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms

Hao Zhang; Honglei Song; Xiaoming Xu; Qixin Chang; Mingkai Wang; Yanjie Wei; Zekun Yin; Bertil Schmidt; Weiguo Liu

doi:10.1109/TCBB.2022.3219114

RabbitFX: Efficient Framework for FASTA/Q File Parsing on Modern Multi-Core Platforms

IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2341-2348. doi: 10.1109/TCBB.2022.3219114. Epub 2023 Jun 5.

Authors

Hao Zhang, Honglei Song, Xiaoming Xu, Qixin Chang, Mingkai Wang, Yanjie Wei, Zekun Yin, Bertil Schmidt, Weiguo Liu

PMID: 36327193
DOI: 10.1109/TCBB.2022.3219114

Abstract

The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology*
High-Throughput Nucleotide Sequencing
Software*