Approaches for dealing with various sources of overdispersion in modeling count data: Scale adjustment versus modeling

Stat Methods Med Res. 2017 Aug;26(4):1802-1823. doi: 10.1177/0962280215588569. Epub 2015 May 31.

Abstract

Overdispersion is a common problem in count data. It can occur due to extra population-heterogeneity, omission of key predictors, and outliers. Unless properly handled, this can lead to invalid inference. Our goal is to assess the differential performance of methods for dealing with overdispersion from several sources. We considered six different approaches: unadjusted Poisson regression (Poisson), deviance-scale-adjusted Poisson regression (DS-Poisson), Pearson-scale-adjusted Poisson regression (PS-Poisson), negative-binomial regression (NB), and two generalized linear mixed models (GLMM) with random intercept, log-link and Poisson (Poisson-GLMM) and negative-binomial (NB-GLMM) distributions. To rank order the preference of the models, we used Akaike's information criteria/Bayesian information criteria values, standard error, and 95% confidence-interval coverage of the parameter values. To compare these methods, we used simulated count data with overdispersion of different magnitude from three different sources. Mean of the count response was associated with three predictors. Data from two real-case studies are also analyzed. The simulation results showed that NB and NB-GLMM were preferred for dealing with overdispersion resulting from any of the sources we considered. Poisson and DS-Poisson often produced smaller standard-error estimates than expected, while PS-Poisson conversely produced larger standard-error estimates. Thus, it is good practice to compare several model options to determine the best method of modeling count data.

Keywords: Count data; Poisson; generalized linear mixed model; negative-binomial; overdispersion.

MeSH terms

  • Aged
  • Bayes Theorem
  • Female
  • Humans
  • Linear Models
  • Lung Neoplasms / diagnostic imaging
  • Lung Neoplasms / mortality
  • Male
  • Middle Aged
  • Models, Statistical*
  • Poisson Distribution*
  • Regression Analysis*
  • Salmonella / drug effects