Towards an Internet of Science

J Integr Bioinform. 2019 May 30;16(3):20190024. doi: 10.1515/jib-2019-0024.

Abstract

Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computational tools for read mapping are available representing a large amount of duplicated effort. Furthermore, it is unclear whether these tools are correct and only a few have a user base large enough to have encountered and reported most of the potential problems. Bringing together many largely untested tools in a computational pipeline must lead to unpredictable results. Yet, this is the current state. While presently data analysis is performed on personal computers/workstations/clusters, the future will see development and analysis shift to the cloud. None of the workflow management systems is ready for this transition. This presents the opportunity to build a new system, which will overcome current duplications of effort, introduce proper testing, allow for development and analysis in public and private clouds, and include reporting features leading to interactive documents.

Keywords: code smells; computational pipelines; internet of things; scientific computing; workflow management.

MeSH terms

  • Computational Biology*
  • Internet*
  • Software*
  • User-Computer Interface*
  • Workflow*