Finding Maximal Exact Matches Using the r-Index

Massimiliano Rossi; Marco Oliva; Paola Bonizzoni; Ben Langmead; Travis Gagie; Christina Boucher

doi:10.1089/cmb.2021.0445

Finding Maximal Exact Matches Using the r-Index

J Comput Biol. 2022 Feb;29(2):188-194. doi: 10.1089/cmb.2021.0445. Epub 2022 Jan 17.

Authors

Massimiliano Rossi¹, Marco Oliva¹, Paola Bonizzoni², Ben Langmead³, Travis Gagie⁴, Christina Boucher¹

Affiliations

¹ Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida, USA.
² Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.
⁴ Faculty of Computer Science, Dalhousie University, Halifax, Canada.

Abstract

Efficiently finding maximal exact matches (MEMs) between a sequence read and a database of genomes is a key first step in read alignment. But until recently, it was unknown how to build a data structure in [Formula: see text] space that supports efficient MEM finding, where r is the number of runs in the Burrows-Wheeler Transform. In 2021, Rossi et al. showed how to build a small auxiliary data structure called thresholds in addition to the r-index in [Formula: see text] space. This addition enables efficient MEM finding using the r-index. In this article, we present the tool that implements this solution, which we call MONI. Namely, we give a high-level view of the main components of the data structure and show how the source code can be downloaded, compiled, and used to find MEMs between a set of sequence reads and a set of genomes.

Keywords: MEM finding; r-index; run-length-encoded Burrows–Wheeler transform; thresholds.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computational Biology
Databases, Genetic / statistics & numerical data
Genome, Human
Genomics / statistics & numerical data
Humans
Sequence Alignment / statistics & numerical data*
Sequence Analysis, DNA / statistics & numerical data
Software*

Abstract

Publication types

MeSH terms

Grants and funding