grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories

Louis J Taylor; Arwa Abbas; Frederic D Bushman

doi:10.1093/bioinformatics/btaa167

grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories

Bioinformatics. 2020 Jun 1;36(11):3607-3609. doi: 10.1093/bioinformatics/btaa167.

Authors

Louis J Taylor¹, Arwa Abbas², Frederic D Bushman¹

Affiliations

¹ Department of Microbiology, Perelman School of Medicine, University of Pennsylvania.
² Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

Abstract

Summary: High-throughput sequencing is a powerful technique for addressing biological questions. Grabseqs streamlines access to publicly available metagenomic data by providing a single, easy-to-use interface to download data and metadata from multiple repositories, including the Sequence Read Archive, the Metagenomics Rapid Annotation through Subsystems Technology server and iMicrobe. Users can download data and metadata in a standardized format from any number of samples or projects from a given repository with a single grabseqs command.

Availability and implementation: Grabseqs is an open-source tool implemented in Python and licensed under the MIT license. The source code is freely available at https://github.com/louiejtaylor/grabseqs, the Python Package Index and Anaconda Cloud repository.

Contact: bushman@pennmedicine.upenn.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

High-Throughput Nucleotide Sequencing*
Metadata*
Metagenome
Metagenomics
Software

Abstract

Publication types

MeSH terms

Grants and funding