Refgenie: a reference genome resource manager

Gigascience. 2020 Feb 1;9(2):giz149. doi: 10.1093/gigascience/giz149.

Abstract

Background: Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.

Results: Here, we introduce refgenie, a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.

Conclusions: Refgenie streamlines sharing genome analysis resources among groups and across computing environments. Refgenie is available at https://refgenie.databio.org.

Keywords: data management; data portability; reference assemblies; reference genomes.

MeSH terms

  • Computational Biology
  • Genome / genetics*
  • High-Throughput Nucleotide Sequencing / standards
  • Molecular Sequence Annotation / standards
  • Reference Standards*
  • Software*