Taxonize-gb: A tool for filtering GenBank non-redundant databases based on taxonomy

bioRxiv [Preprint]. 2024 Mar 27:2024.03.22.586347. doi: 10.1101/2024.03.22.586347.

Abstract

Analyzing taxonomic diversity and identification in diverse ecological samples has become a crucial routine in various research and industrial fields. While DNA barcoding marker-gene approaches were once prevalent, the decreasing costs of next-generation sequencing have made metagenomic shotgun sequencing more popular and feasible. In contrast to DNA-barcoding, metagenomic shotgun sequencing offers possibilities for in-depth characterization of structural and functional diversity. However, analysis of such data is still considered a hurdle due to absence of taxa-specific databases. Here we present taxonize-gb, a command-line software tool to extract GenBank non-redundant nucleotide and protein databases, related to one or more input taxonomy identifier. Our tool allows the creation of taxa-specific reference databases tailored to specific research questions, which reduces search times and therefore represents a practical solution for researchers analyzing large metagenomic data on regular basis. Taxonize-gb is an open-source command-line Python-based tool freely available for installation at https://pypi.org/project/taxonize-gb/ and on GitHub https://github.com/msabrysarhan/taxonize_genbank. It is released under Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Publication types

  • Preprint