grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories

Bioinformatics. 2020 Jun 1;36(11):3607-3609. doi: 10.1093/bioinformatics/btaa167.

Abstract

Summary: High-throughput sequencing is a powerful technique for addressing biological questions. Grabseqs streamlines access to publicly available metagenomic data by providing a single, easy-to-use interface to download data and metadata from multiple repositories, including the Sequence Read Archive, the Metagenomics Rapid Annotation through Subsystems Technology server and iMicrobe. Users can download data and metadata in a standardized format from any number of samples or projects from a given repository with a single grabseqs command.

Availability and implementation: Grabseqs is an open-source tool implemented in Python and licensed under the MIT license. The source code is freely available at https://github.com/louiejtaylor/grabseqs, the Python Package Index and Anaconda Cloud repository.

Contact: bushman@pennmedicine.upenn.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing*
  • Metadata*
  • Metagenome
  • Metagenomics
  • Software