FragGeneScanRs: faster gene prediction for short reads

BMC Bioinformatics. 2022 May 28;23(1):198. doi: 10.1186/s12859-022-04736-5.

Abstract

Background: FragGeneScan is currently the most accurate and popular tool for gene prediction in short and error-prone reads, but its execution speed is insufficient for use on larger data sets. The parallelization which should have addressed this is inefficient. Its alternative implementation FragGeneScan+ is faster, but introduced a number of bugs related to memory management, race conditions and even output accuracy.

Results: This paper introduces FragGeneScanRs, a faster Rust implementation of the FragGeneScan gene prediction model. Its command line interface is backward compatible and adds extra features for more flexible usage. Its output is equivalent to the original FragGeneScan implementation.

Conclusions: Compared to the current C implementation, shotgun metagenomic reads are processed up to 22 times faster using a single thread, with better scaling for multithreaded execution. The Rust code of FragGeneScanRs is freely available from GitHub under the GPL-3.0 license with instructions for installation, usage and other documentation ( https://github.com/unipept/FragGeneScanRs ).

Keywords: Gene prediction; Hidden markov model; Rust; Shotgun metagenomics.

MeSH terms

  • Algorithms*
  • Metagenome
  • Metagenomics
  • Software*