Count data models for outpatient health services utilisation

BMC Med Res Methodol. 2022 Oct 5;22(1):261. doi: 10.1186/s12874-022-01733-3.

Abstract

Background: Count data from the national survey captures healthcare utilisation within a specific reference period, resulting in excess zeros and skewed positive tails. Often, it is modelled using count data models. This study aims to identify the best-fitting model for outpatient healthcare utilisation using data from the Malaysian National Health and Morbidity Survey 2019 (NHMS 2019) and utilisation factors among adults in Malaysia.

Methods: The frequency of outpatient visits is the dependent variable, and instrumental variable selection is based on Andersen's model. Six different models were used: ordinary least squares (OLS), Poisson regression, negative binomial regression (NB), inflated models: zero-inflated Poisson, marginalized-zero-inflated negative binomial (MZINB), and hurdle model. Identification of the best-fitting model was based on model selection criteria, goodness-of-fit and statistical test of the factors associated with outpatient visits.

Results: The frequency of zero was 90%. Of the sample, 8.35% of adults utilized healthcare services only once, and 1.04% utilized them twice. The mean-variance value varied between 0.14 and 0.39. Across six models, the zero-inflated model (ZIM) possesses the smallest log-likelihood, Akaike information criterion, Bayesian information criterion, and a positive Vuong corrected value. Fourteen instrumental variables, five predisposing factors, six enablers, and three need factors were identified. Data overdispersion is characterized by excess zeros, a large mean to variance value, and skewed positive tails. We assumed frequency and true zeros throughout the study reference period. ZIM is the best-fitting model based on the model selection criteria, smallest Root Mean Square Error (RMSE) and higher R2. Both Vuong corrected and uncorrected values with different Stata commands yielded positive values with small differences.

Conclusion: State as a place of residence, ethnicity, household income quintile, and health needs were significantly associated with healthcare utilisation. Our findings suggest using ZIM over traditional OLS. This study encourages the use of this count data model as it has a better fit, is easy to interpret, and has appropriate assumptions based on the survey methodology.

Keywords: Count model; Health behavioral model; Healthcare utilisation; Outpatient; Zero-inflated model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Ambulatory Care
  • Bayes Theorem
  • Humans
  • Models, Statistical*
  • Outpatients*
  • Patient Acceptance of Health Care
  • Poisson Distribution