Utility of the Python package Geoweaver_cwl for improving workflow reusability: an illustration with multidisciplinary use cases

Earth Sci Inform. 2023;16(3):2955-2961. doi: 10.1007/s12145-023-01045-0. Epub 2023 Jul 10.

Abstract

Computational workflows are widely used in data analysis, enabling automated tracking of steps and storage of provenance information, leading to innovation and decision-making in the scientific community. However, the growing popularity of workflows has raised concerns about reproducibility and reusability which can hinder collaboration between institutions and users. In order to address these concerns, it is important to standardize workflows or provide tools that offer a framework for describing workflows and enabling computational reusability. One such set of standards that has recently emerged is the Common Workflow Language (CWL), which offers a robust and flexible framework for data analysis tools and workflows. To promote portability, reproducibility, and interoperability of AI/ML workflows, we developed geoweaver_cwl, a Python package that automatically describes AI/ML workflows from a workflow management system (WfMS) named Geoweaver into CWL. In this paper, we test our Python package on multiple use cases from different domains. Our objective is to demonstrate and verify the utility of this package. We make all the code and dataset open online and briefly describe the experimental implementation of the package in this paper, confirming that geoweaver_cwl can lead to a well-versed AI process while disclosing opportunities for further extensions. The geoweaver_cwl package is publicly released online at https://pypi.org/project/geoweaver-cwl/0.0.1/ and exemplar results are accessible at: https://github.com/amrutakale08/geoweaver_cwl-usecases.

Keywords: Common Workflow Language; Interoperability; Provenance documentation; Reproducibility; Workflow Platforms.