A bivariate zero-inflated negative binomial model and its applications to biomedical settings

Stat Methods Med Res. 2023 Jul;32(7):1300-1317. doi: 10.1177/09622802231172028. Epub 2023 May 11.

Abstract

The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package "bzinb" is available on Comprehensive R Archive Network.

Keywords: Bivariate zero-inflated negative binomial model; dental caries; expectation-maximization algorithm; single-cell RNA sequencing.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binomial Distribution
  • Data Analysis
  • Dental Caries*
  • Humans
  • Models, Statistical
  • Poisson Distribution