Revisiting the analysis pipeline for overdispersed Poisson and binomial data

J Appl Stat. 2022 Jan 20;50(7):1455-1476. doi: 10.1080/02664763.2022.2026897. eCollection 2023.

Abstract

Overdispersion is a common feature in categorical data analysis and several methods have been developed for detecting and handling it in generalized linear models. The first aim of this study is to clarify the relationships among various score statistics for testing overdispersion and to compare their performances. In addition, we investigate a principled way to correct finite sample bias in the score statistic caused by estimating regression parameters with restricted likelihood. The second aim is to reconsider the current practice for handling overdispersed categorical data. Although the conventional models are based on substantially different mechanisms for generating overdispersion, model selection in practice has not been well studied. We perform an intensive numerical study for determining which method is more robust to various overdispersion mechanisms. In addition, we provide some graphical tools for identifying the better model. The last aim is to reconsider the key assumption for deriving the score statistics. We study the meaning of testing overdispersion when this assumption is violated, and we analytically show the conditions for which it is not appropriate to employ the current statistical practices for analyzing overdispersed data.

Keywords: Overdispersion; graphical tools; parametric bootstrap; restricted likelihood; score test.

Publication types

  • Review

Grants and funding

Woojoo Lee and Donghwan Lee were supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) [grant numbers 2021R1A2C1014409 and 2021R1A2C1012865].