A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

Jochen Kruppa; Ludwig Hothorn

doi:10.1080/02664763.2020.1788518

A comparison study on modeling of clustered and overdispersed count data for multiple comparisons

J Appl Stat. 2020 Jul 3;48(16):3220-3232. doi: 10.1080/02664763.2020.1788518. eCollection 2021.

Authors

Jochen Kruppa^{1

2}, Ludwig Hothorn³

Affiliations

¹ Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany.
² Berlin Institute of Health (BIH), Berlin, Germany.
³ Institute of Biostatistics, Leibniz University Hannover, Germany, Hannover, Germany.

Abstract

Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered - e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion.

Keywords: Generalized estimation equations; generalized linear mixed models; overdispersion; repeated measurements; simultaneous contrast tests.

Publication types

Review