A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses

Heidi Seibold; Severin Czerny; Siona Decke; Roman Dieterle; Thomas Eder; Steffen Fohr; Nico Hahn; Rabea Hartmann; Christoph Heindl; Philipp Kopper; Dario Lepke; Verena Loidl; Maximilian Mandl; Sarah Musiol; Jessica Peter; Alexander Piehler; Elio Rojas; Stefanie Schmid; Hannah Schmidt; Melissa Schmoll; Lennart Schneider; Xiao-Yin To; Viet Tran; Antje Völker; Moritz Wagner; Joshua Wagner; Maria Waize; Hannah Wecker; Rui Yang; Simone Zellner; Malte Nalenz

doi:10.1371/journal.pone.0251194

A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses

PLoS One. 2021 Jun 21;16(6):e0251194. doi: 10.1371/journal.pone.0251194. eCollection 2021.

Authors

Heidi Seibold^{1

2

3

4}, Severin Czerny¹, Siona Decke¹, Roman Dieterle¹, Thomas Eder¹, Steffen Fohr¹, Nico Hahn¹, Rabea Hartmann¹, Christoph Heindl¹, Philipp Kopper¹, Dario Lepke¹, Verena Loidl¹, Maximilian Mandl¹, Sarah Musiol¹, Jessica Peter¹, Alexander Piehler¹, Elio Rojas¹, Stefanie Schmid¹, Hannah Schmidt¹, Melissa Schmoll¹, Lennart Schneider¹, Xiao-Yin To¹, Viet Tran¹, Antje Völker¹, Moritz Wagner¹, Joshua Wagner¹, Maria Waize¹, Hannah Wecker¹, Rui Yang¹, Simone Zellner¹, Malte Nalenz¹

Affiliations

¹ Department of Statistics, LMU Munich, Munich, Germany.
² Data Science Group, University of Bielefeld, Bielefeld, Germany.
³ Helmholtz AI, Helmholtz Zentrum München, Munich, Germany.
⁴ LMU Open Science Center, LMU Munich, Munich, Germany.

Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Data Analysis
Humans
Longitudinal Studies
Publications
Reproducibility of Results
Research Design
Software

Grants and funding

This research has been supported by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A (Munich Center of Machine Learning) to HS.