Is There Evidence of P-Hacking in Imaging Research?

Paul Rooprai; Nayaar Islam; Jean-Paul Salameh; Sanam Ebrahimzadeh; Abrar Kazi; Robert Frank; Tim Ramsay; Maya B Mathur; Marissa Absi; Ahmed Khalil; Sakib Kazi; Haben Dawit; Eric Lam; Nicholas Fabiano; Matthew D F McInnes

doi:10.1177/08465371221139418

Is There Evidence of P-Hacking in Imaging Research?

Can Assoc Radiol J. 2023 Aug;74(3):497-507. doi: 10.1177/08465371221139418. Epub 2022 Nov 22.

Authors

Affiliations

¹ Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.
² School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
³ Department of Radiology, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.
⁴ Carleton University, Ottawa, ON, Canada.
⁵ Department of Radiology, Faculty of Medicine, Ottawa Hospital, Ottawa, ON, Canada.
⁶ Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada.
⁷ Quantitative Sciences Unit and Department of Pediatrics, Stanford University, Ottawa, ON, Canada.

Abstract

Background: P-hacking, the tendency to run selective analyses until they become significant, is prevalent in many scientific disciplines.

Purpose: This study aims to assess if p-hacking exists in imaging research.

Methods: Protocol, data, and code available here https://osf.io/xz9ku/?view_only=a9f7c2d841684cb7a3616f567db273fa. We searched imaging journals Ovid MEDLINE from 1972 to 2021. Text mining using Python script was used to collect metadata: journal, publication year, title, abstract, and P-values from abstracts. One P-value was randomly sampled per abstract. We assessed for evidence of p-hacking using a p-curve, by evaluating for a concentration of P-values just below .05. We conducted a one-tailed binomial test (α = .05 level of significance) to assess whether there were more P-values falling in the upper range (e.g., .045 < P < .05) than in the lower range (e.g., .04 < P < .045). To assess variation in results introduced by our random sampling of a single P-value per abstract, we repeated the random sampling process 1000 times and pooled results across the samples. Analysis was done (divided into 10-year periods) to determine if p-hacking practices evolved over time.

Results: Our search of 136 journals identified 967,981 abstracts. Text mining identified 293,687 P-values, and a total of 4105 randomly sampled P-values were included in the p-hacking analysis. The number of journals and abstracts that were included in the analysis as a fraction and percentage of the total number was, respectively, 108/136 (80%) and 4105/967,981 (.4%). P-values did not concentrate just under .05; in fact, there were more P-values falling in the lower range (e.g., .04 < P < .045) than falling just below .05 (e.g., .045 < P < .05), indicating lack of evidence for p-hacking. Time trend analysis did not identify p-hacking in any of the five 10-year periods.

Conclusion: We did not identify evidence of p-hacking in abstracts published in over 100 imaging journals since 1972. These analyses cannot detect all forms of p-hacking, and other forms of bias may exist in imaging research such as publication bias and selective outcome reporting.

Keywords: epidemiology; Evidence-Based Practice; Reporting Bias; statistics.

MeSH terms

Publication Bias*
Statistics as Topic*

Grants and funding

R01 LM013866/LM/NLM NIH HHS/United States