A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses

PLoS One. 2021 Jun 21;16(6):e0251194. doi: 10.1371/journal.pone.0251194. eCollection 2021.

Abstract

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods*
  • Data Analysis
  • Humans
  • Longitudinal Studies
  • Publications
  • Reproducibility of Results
  • Research Design
  • Software

Grants and funding

This research has been supported by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A (Munich Center of Machine Learning) to HS.