ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies

PeerJ. 2021 May 5:9:e11348. doi: 10.7717/peerj.11348. eCollection 2021.

Abstract

TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].

Keywords: Alignment-free methods; Dereplication; Genome quality; Genome selection; Metagenomics; NCBI RefSeq; Phylogenomics; Prokaryotes; Singularity.

Associated data

  • figshare/10.6084/m9.figshare.13238936.v2

Grants and funding

Raphaël R. Léonard and Mick Van Vlierberghe were supported by FRIA fellowships of the Belgian National Fund for Scientific Research (F.R.S.-FNRS). Marie Leleu is supported by the French Agence Nationale de la Recherche (ANR, project MATHTEST). Frédéric Kerff is a Research Associate employed by the F.R.S.-FNRS. Computational resources were provided through two grants to DB (University of Liège “Crédit de démarrage 2012” SFRD-12/04; F.R.S.-FNRS “Crédit de recherche 2014” CDR J.0080.15). This work (and Luc Cornet) was also supported by a research grant to DB (no. B2/191/P2/BCCM GEN-ERA) funded by the Belgian Science Policy Office (BELSPO). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.