Simplifying the development of portable, scalable, and reproducible workflows

Elife. 2021 Oct 13:10:e71069. doi: 10.7554/eLife.71069.

Abstract

Command-line software plays a critical role in biology research. However, processes for installing and executing software differ widely. The Common Workflow Language (CWL) is a community standard that addresses this problem. Using CWL, tool developers can formally describe a tool's inputs, outputs, and other execution details. CWL documents can include instructions for executing tools inside software containers. Accordingly, CWL tools are portable-they can be executed on diverse computers-including personal workstations, high-performance clusters, or the cloud. CWL also supports workflows, which describe dependencies among tools and using outputs from one tool as inputs to others. To date, CWL has been used primarily for batch processing of large datasets, especially in genomics. But it can also be used for analytical steps of a study. This article explains key concepts about CWL and software containers and provides examples for using CWL in biology research. CWL documents are text-based, so they can be created manually, without computer programming. However, ensuring that these documents conform to the CWL specification may prevent some users from adopting it. To address this gap, we created ToolJig, a Web application that enables researchers to create CWL documents interactively. ToolJig validates information provided by the user to ensure it is complete and valid. After creating a CWL tool or workflow, the user can create 'input-object' files, which store values for a particular invocation of a tool or workflow. In addition, ToolJig provides examples of how to execute the tool or workflow via a workflow engine. ToolJig and our examples are available at https://github.com/srp33/ToolJig.

Keywords: Common Workflow Language; Web application; command-line software; computational biology; computational workflows; learn by example; none; research reproducibility; systems biology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biomedical Research*
  • Data Mining
  • Databases, Genetic
  • Gene Expression Profiling
  • Genomics*
  • Programming Languages*
  • Reproducibility of Results
  • Software Design*
  • User-Computer Interface
  • Workflow*