How the Post-Data Severity Converts Testing Results into Evidence for or against Pertinent Inferential Claims

Aris Spanos

doi:10.3390/e26010095

How the Post-Data Severity Converts Testing Results into Evidence for or against Pertinent Inferential Claims

Entropy (Basel). 2024 Jan 22;26(1):95. doi: 10.3390/e26010095.

Author

Aris Spanos¹

Affiliation

¹ Department of Economics, Virginia Tech, Blacksburg, VA 24061, USA.

PMID: 38275503
DOI: 10.3390/e26010095

Abstract

The paper makes a case that the current discussions on replicability and the abuse of significance testing have overlooked a more general contributor to the untrustworthiness of published empirical evidence, which is the uninformed and recipe-like implementation of statistical modeling and inference. It is argued that this contributes to the untrustworthiness problem in several different ways, including [a] statistical misspecification, [b] unwarranted evidential interpretations of frequentist inference results, and [c] questionable modeling strategies that rely on curve-fitting. What is more, the alternative proposals to replace or modify frequentist testing, including [i] replacing p-values with observed confidence intervals and effects sizes, and [ii] redefining statistical significance, will not address the untrustworthiness of evidence problem since they are equally vulnerable to [a]-[c]. The paper calls for distinguishing between unduly data-dependant 'statistical results', such as a point estimate, a p-value, and accept/reject H0, from 'evidence for or against inferential claims'. The post-data severity (SEV) evaluation of the accept/reject H0 results, converts them into evidence for or against germane inferential claims. These claims can be used to address/elucidate several foundational issues, including (i) statistical vs. substantive significance, (ii) the large n problem, and (iii) the replicability of evidence. Also, the SEV perspective sheds light on the impertinence of the proposed alternatives [i]-[iii], and oppugns [iii] the alleged arbitrariness of framing H0 and H1 which is often exploited to undermine the credibility of frequentist testing.

Keywords: effect sizes; observed confidence intervals; p-hacking; post-data severity evaluation; pre-data vs. post-data error probabilities; replication; statistical misspecification; statistical vs. substantive significance; untrustworthy evidence.