On testing for homogeneity with zero-inflated models through the lens of model misspecification

Int Stat Rev. 2022 Apr;90(1):62-77. doi: 10.1111/insr.12462. Epub 2021 Jul 5.

Abstract

In many applications of two-component mixture models such as the popular zero-inflated model for discrete-valued data, it is customary for the data analyst to evaluate the inherent heterogeneity in view of observed data. To this end, the score test, acclaimed for its simplicity, is routinely performed. It has long been recognized that this test may behave erratically under model misspecification, but the implications of this behavior remain poorly understood for popular two-component mixture models. For the special case of zero-inflated count models, we use data simulations and theoretical arguments to evaluate this behavior and discuss its implications in settings where the working model is restrictive with regard to the true data generating mechanism. We enrich this discussion with an analysis of count data in HIV research, where a one-component model is shown to fit the data reasonably well despite apparent extra zeros. These results suggest that a rejection of homogeneity does not imply that the underlying mixture model is appropriate. Rather, such a rejection simply implies that the mixture model should be carefully interpreted in the light of potential model misspecifications, and further evaluated against other competing models.

Keywords: count data; cure rate survival model; score test; size of test.