Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Kyle Ellrott; Alex Buchanan; Allison Creason; Michael Mason; Thomas Schaffter; Bruce Hoff; James Eddy; John M Chilton; Thomas Yu; Joshua M Stuart; Julio Saez-Rodriguez; Gustavo Stolovitzky; Paul C Boutros; Justin Guinney

doi:10.1186/s13059-019-1794-0

Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

Genome Biol. 2019 Sep 10;20(1):195. doi: 10.1186/s13059-019-1794-0.

Authors

Kyle Ellrott¹, Alex Buchanan¹, Allison Creason¹, Michael Mason², Thomas Schaffter³, Bruce Hoff², James Eddy², John M Chilton⁴, Thomas Yu², Joshua M Stuart⁵, Julio Saez-Rodriguez^{6

7}, Gustavo Stolovitzky³, Paul C Boutros^{8

9

10

11

12}, Justin Guinney^{13

14}

Affiliations

¹ Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA.
² Sage Bionetworks, Seattle, WA, USA.
³ IBM Research, Yorktown Heights, NY, USA.
⁴ Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, State College, PA, USA.
⁵ University of California, Santa Cruz, Santa Cruz, CA, USA.
⁶ Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Bioquant, Heidelberg, Germany.
⁷ Joint Research Center for Computational Biomedicine, RWTH Aachen University, Faculty of Medicine, Aachen, Germany.
⁸ Ontario Institute for Cancer Research, Toronto, Canada.
⁹ Departments of Medical Biophysics and Pharmacology & Toxicology, University of Toronto, Toronto, Canada.
¹⁰ Departments of Human Genetics and Urology, University of California, Los Angeles, CA, USA.
¹¹ Jonsson Comprehensive Cancer Centre, University of California, Los Angeles, CA, USA.
¹² Institute for Precision Health, University of California, Los Angeles, CA, USA.
¹³ Sage Bionetworks, Seattle, WA, USA. justin.guinney@sagebase.org.
¹⁴ Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98195, USA. justin.guinney@sagebase.org.

Abstract

Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.

Publication types

Letter
Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Benchmarking
Information Dissemination
Models, Biological
Reproducibility of Results

Associated data

figshare/10.6084/m9.figshare.3115156.v2

Abstract

Publication types

MeSH terms

Associated data

Grants and funding