Statistical Assumptions and Reproducibility in Psychology: Data Mining Based on Open Science

Wenqing Zhang; Shu Yan; Bo Tian; Dingzhou Fei

doi:10.3389/fpsyg.2022.905977

Statistical Assumptions and Reproducibility in Psychology: Data Mining Based on Open Science

Front Psychol. 2022 May 30:13:905977. doi: 10.3389/fpsyg.2022.905977. eCollection 2022.

Authors

Wenqing Zhang¹, Shu Yan¹, Bo Tian¹, Dingzhou Fei¹

Affiliation

¹ Department of Psychology, Wuhan University, Wuhan, China.

Abstract

The failures of reproducibility in psychology (or other social sciences) can be investigated by tracing their logical chains, from statistical hypothesis to their conclusion. This research starts with the normality hypothesis, the homoscedasticity hypothesis, and the robust hypothesis and uses the R language to simulate and analyze the original data of 100 studies in Estimating the Reproducibility of Psychological Science to explore the influence of the premise hypothesis on statistical methods on the reproducibility of psychological research. The results indicated the following: (1) the answer to the question about psychological studies being repeatable or not relates to the fields to which the subjects belonged, (2) not all the psychological variables meet the normal distribution hypothesis, (3) the t-test is a more robust tool for psychological research than the Analysis of Variance (ANOVA), and (4) the robustness of ANOVA is independent of the normality and variance congruence of the analyzed data. This study made us realize that the repeatable study factors in psychology are more complex than we expected them to be.

Keywords: data mining; homoscedasticity hypothesis; normality hypothesis; reproducibility of psychology; robust hypothesis.