Screening p-hackers: Dissemination noise as bait

Federico Echenique; Kevin He

doi:10.1073/pnas.2400787121

Screening p-hackers: Dissemination noise as bait

Proc Natl Acad Sci U S A. 2024 May 21;121(21):e2400787121. doi: 10.1073/pnas.2400787121. Epub 2024 May 17.

Authors

Federico Echenique^#¹, Kevin He^#²

Affiliations

¹ Department of Economics, University of California, Berkeley, CA 94720.
² Department of Economics, University of Pennsylvania, Philadelphia, PA 19104.

^# Contributed equally.

PMID: 38758697
PMCID: PMC11126912 (available on 2024-11-17)
DOI: 10.1073/pnas.2400787121

Abstract

We show that adding noise before publishing data effectively screens [Formula: see text]-hacked findings: spurious explanations produced by fitting many statistical models (data mining). Noise creates "baits" that affect two types of researchers differently. Uninformed [Formula: see text]-hackers, who are fully ignorant of the true mechanism and engage in data mining, often fall for baits. Informed researchers, who start with an ex ante hypothesis, are minimally affected. We show that as the number of observations grows large, dissemination noise asymptotically achieves optimal screening. In a tractable special case where the informed researchers' theory can identify the true causal mechanism with very few data, we characterize the optimal level of dissemination noise and highlight the relevant trade-offs. Dissemination noise is a tool that statistical agencies currently use to protect privacy. We argue this existing practice can be repurposed to screen [Formula: see text]-hackers and thus improve research credibility.

Keywords: dissemination noise; p-hacking; privacy; research integrity.

Abstract

Grants and funding