iRODS metadata management for a cancer genome analysis workflow

BMC Bioinformatics. 2019 Jan 15;20(1):29. doi: 10.1186/s12859-018-2576-5.

Abstract

Background: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date.

Results: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information.

Conclusions: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.

Keywords: Data consistency; Data security; Genome analysis; High performance computing (HPC); Metadata management; Next generation sequencing (NGS); Workflow integration; iRODS.

MeSH terms

  • Computer Security
  • Computing Methodologies*
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Metadata*
  • Neoplasms / genetics*
  • Polymorphism, Genetic
  • Sequence Analysis, DNA / methods*
  • Software*
  • Workflow