Significance Testing Needs a Taxonomy: Or How the Fisher, Neyman-Pearson Controversy Resulted in the Inferential Tail Wagging the Measurement Dog

Michael T Bradley; Andrew Brand

doi:10.1177/0033294116662659

Significance Testing Needs a Taxonomy: Or How the Fisher, Neyman-Pearson Controversy Resulted in the Inferential Tail Wagging the Measurement Dog

Psychol Rep. 2016 Oct;119(2):487-504. doi: 10.1177/0033294116662659. Epub 2016 Aug 8.

Authors

Michael T Bradley¹, Andrew Brand²

Affiliations

¹ University of New Brunswick, Saint John, NB, Canada bradley@unb.ca.
² NWORTH Bangor Clinical Trials Unit, Institute of Medical & Social Care Research, Bangor University, UK.

PMID: 27502529
DOI: 10.1177/0033294116662659

Abstract

Accurate measurement and a cutoff probability with inferential statistics are not wholly compatible. Fisher understood this when he developed the F test to deal with measurement variability and to make judgments on manipulations that may be worth further study. Neyman and Pearson focused on modeled distributions whose parameters were highly determined and concluded that inferential judgments following an F test could be made with accuracy because the distribution parameters were determined. Neyman and Pearson's approach in the application of statistical analyses using alpha and beta error rates has played a dominant role guiding inferential judgments, appropriately in highly determined situations and inappropriately in scientific exploration. Fisher tried to explain the different situations, but, in part due to some obscure wording, generated a long standing dispute that currently has left the importance of Fisher's p < .05 criteria not fully understood and a general endorsement of the Neyman and Pearson error rate approach. Problems were compounded with power calculations based on effect sizes following significant results entering into exploratory science. To understand in a practical sense when each approach should be used, a dimension reflecting varying levels of certainty or knowledge of population distributions is presented. The dimension provides a taxonomy of statistical situations and appropriate approaches by delineating four zones that represent how well the underlying population of interest is defined ranging from exploratory situations to highly determined populations.

Keywords: Correct use of statistical techniques; error rates; measurement; measures and statistics; probabilities; psychometrics; significance testing; taxonomy.

MeSH terms

Data Interpretation, Statistical*
Humans
Psychometrics / standards*