How often should we expect to be wrong? Statistical power, P values, and the expected prevalence of false discoveries

Biochem Pharmacol. 2018 May:151:226-233. doi: 10.1016/j.bcp.2017.12.011. Epub 2017 Dec 14.

Abstract

There is a clear perception in the literature that there is a crisis in reproducibility in the biomedical sciences. Many underlying factors contributing to the prevalence of irreproducible results have been highlighted with a focus on poor design and execution of experiments along with the misuse of statistics. While these factors certainly contribute to irreproducibility, relatively little attention outside of the specialized statistical literature has focused on the expected prevalence of false discoveries under idealized circumstances. In other words, when everything is done correctly, how often should we expect to be wrong? Using a simple simulation of an idealized experiment, it is possible to show the central role of sample size and the related quantity of statistical power in determining the false discovery rate, and in accurate estimation of effect size. According to our calculations, based on current practice many subfields of biomedical science may expect their discoveries to be false at least 25% of the time, and the only viable course to correct this is to require the reporting of statistical power and a minimum of 80% power (1 - β = 0.80) for all studies.

Keywords: False discovery rate; Reproducibility; Sample size; Statistical power; p value.

Publication types

  • Review

MeSH terms

  • Animals
  • Biomedical Research / standards
  • Biomedical Research / statistics & numerical data*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • False Negative Reactions
  • False Positive Reactions
  • Humans
  • Prevalence
  • Reproducibility of Results*
  • Research Design / standards
  • Research Design / statistics & numerical data*
  • Sample Size