Sample size issues in multilevel logistic regression models

PLoS One. 2019 Nov 22;14(11):e0225427. doi: 10.1371/journal.pone.0225427. eCollection 2019.

Abstract

Educational researchers, psychologists, social, epidemiological and medical scientists are often dealing with multilevel data. Sometimes, the response variable in multilevel data is categorical in nature and needs to be analyzed through Multilevel Logistic Regression Models. The main theme of this paper is to provide guidelines for the analysts to select an appropriate sample size while fitting multilevel logistic regression models for different threshold parameters and different estimation methods. Simulation studies have been performed to obtain optimum sample size for Penalized Quasi-likelihood (PQL) and Maximum Likelihood (ML) Methods of estimation. Our results suggest that Maximum Likelihood Method performs better than Penalized Quasi-likelihood Method and requires relatively small sample under chosen conditions. To achieve sufficient accuracy of fixed and random effects under ML method, we established ''50/50" and ''120/50" rule respectively. On the basis our findings, a ''50/60" and ''120/70" rules under PQL method of estimation have also been recommended.

MeSH terms

  • Computer Simulation
  • Guidelines as Topic
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Multilevel Analysis / methods*
  • Research Design / standards*
  • Sample Size

Grants and funding

The authors received no specific funding for this work.