A non-parametric significance test to compare corpora

Alexander Koplenig

doi:10.1371/journal.pone.0222703

A non-parametric significance test to compare corpora

PLoS One. 2019 Sep 19;14(9):e0222703. doi: 10.1371/journal.pone.0222703. eCollection 2019.

Author

Alexander Koplenig¹

Affiliation

¹ Leibniz Institute for the German language (IDS), Mannheim, Germany.

Abstract

Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Language
Linguistics / methods*
Probability
Research Design

Grants and funding

The publication of this article was partially funded by the Open Access Fund of the Leibniz Association. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.