The Wilcoxon-Mann-Whitney test under scrutiny

Morten W Fagerland; Leiv Sandvik

doi:10.1002/sim.3561

The Wilcoxon-Mann-Whitney test under scrutiny

Stat Med. 2009 May 1;28(10):1487-97. doi: 10.1002/sim.3561.

Authors

Morten W Fagerland¹, Leiv Sandvik

Affiliation

¹ Ullevål Department of Research Administration, Oslo University Hospital, Norway. morten.fagerland@medisin.uio.no

PMID: 19247980
DOI: 10.1002/sim.3561

Abstract

The Wilcoxon-Mann-Whitney (WMW) test is often used to compare the means or medians of two independent, possibly nonnormal distributions. For this problem, the true significance level of the large sample approximate version of the WMW test is known to be sensitive to differences in the shapes of the distributions. Based on a wide ranging simulation study, our paper shows that the problem of lack of robustness of this test is more serious than is thought to be the case. In particular, small differences in variances and moderate degrees of skewness can produce large deviations from the nominal type I error rate. This is further exacerbated when the two distributions have different degrees of skewness. Other rank-based methods like the Fligner-Policello (FP) test and the Brunner-Munzel (BM) test perform similarly, although the BM test is generally better. By considering the WMW test as a two-sample T test on ranks, we explain the results by noting some undesirable properties of the rank transformation. In practice, the ranked samples should be examined and found to sufficiently satisfy reasonable symmetry and variance homogeneity before the test results are interpreted.

Publication types

Comparative Study
Evaluation Study

MeSH terms

Biometry / methods*
Data Interpretation, Statistical
Humans
Models, Statistical
Statistical Distributions
Statistics, Nonparametric