Dataset decay and the problem of sequential analyses on open datasets

Elife. 2020 May 19:9:e53498. doi: 10.7554/eLife.53498.

Abstract

Open data allows researchers to explore pre-existing datasets in new ways. However, if many researchers reuse the same dataset, multiple statistical testing may increase false positives. Here we demonstrate that sequential hypothesis testing on the same dataset by multiple researchers can inflate error rates. We go on to discuss a number of correction procedures that can reduce the number of false positives, and the challenges associated with these correction procedures.

Keywords: computational biology; human; meta-research; multiple comparison correction; multiple comparisons; neuroscience; open data; sequential testing; systems biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Access to Information
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Datasets as Topic* / standards
  • False Positive Reactions
  • Humans
  • Information Dissemination*
  • Periodicals as Topic
  • Time Factors