Bias properties of Bayesian statistics in finite mixture of negative binomial regression models in crash data analysis

Accid Anal Prev. 2010 Mar;42(2):741-9. doi: 10.1016/j.aap.2009.11.002. Epub 2009 Dec 16.

Abstract

Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about features of the population under study. Despite its usefulness, however, no study has been found to examine the performance of this finite mixture under various conditions of sample sizes and sample-mean values that are common in crash data analysis. This study investigated the bias associated with the Bayesian summary statistics (posterior mean and median) of dispersion parameters in the two-component finite mixture of negative binomial regression models. A simulation study was conducted using various sample sizes under different sample-mean values. Two prior specifications (non-informative and weakly-informative) on the dispersion parameter were also compared. The results showed that the posterior mean using the non-informative prior exhibited a high bias for the dispersion parameter and should be avoided when the dataset contains less than 2,000 observations (even for high sample-mean values). The posterior median showed much better bias properties, particularly at small sample sizes and small sample means. However, as the sample size increases, the posterior median using the non-informative prior also began to exhibit an upward-bias trend. In such cases, the posterior mean or median with the weakly-informative prior provided smaller bias. Based on simulation results, guidelines about the choice of priors and the summary statistics to use are presented for different sample sizes and sample-mean values.

MeSH terms

  • Accidents, Traffic / statistics & numerical data*
  • Bayes Theorem
  • Bias
  • Binomial Distribution
  • Computer Simulation
  • Humans
  • Models, Statistical*
  • Monte Carlo Method