Pearson's goodness-of-fit tests for sparse distributions

J Appl Stat. 2021 Dec 30;50(5):1078-1093. doi: 10.1080/02664763.2021.2017413. eCollection 2023.

Abstract

Pearson's chi-squared test is widely used to test the goodness of fit between categorical data and a given discrete distribution function. When the number of sets of the categorical data, say k, is a fixed integer, Pearson's chi-squared test statistic converges in distribution to a chi-squared distribution with k-1 degrees of freedom when the sample size n goes to infinity. In real applications, the number k often changes with n and may be even much larger than n. By using the martingale techniques, we prove that Pearson's chi-squared test statistic converges to the normal under quite general conditions. We also propose a new test statistic which is more powerful than chi-squared test statistic based on our simulation study. A real application to lottery data is provided to illustrate our methodology.

Keywords: 62E20; Goodness-of-fit; chi-square approximation; discrete distribution; normal approximation; sparse distribution.

Grants and funding

The research of Shuhua Chang was supported in part by the National Basic Research Program of China (973 Program) [grant number 2012CB955804], the National Basic Research Program [grant number 2012CB955804], the National Natural Science Foundation of China [grant number 11771322], and the Major Project of Tianjin University of Finance and Economics [grant number ZD 1302]. The research of Deli Li was partially supported by a grant from the Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada [grant number RGPIN-2019-06065]. The research of Yongcheng Qi was supported in part by the National Science Foundation [grant number DMS-1916014].