Provenance in bioinformatics workflows

BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S6. doi: 10.1186/1471-2105-14-S11-S6. Epub 2013 Nov 4.

Abstract

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacillus cereus / genetics
  • Computational Biology / methods*
  • Genome
  • Software*
  • Workflow