A simulation study for evaluating the performance of clustering measures in multilevel logistic regression

BMC Med Res Methodol. 2021 Nov 13;21(1):245. doi: 10.1186/s12874-021-01417-4.

Abstract

Background: Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)).

Methods: We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi.

Results: Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively.

Conclusion: At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses.

Keywords: Binary outcomes; Childhood anemia; Clusters; Heterogeneity measures; Multilevel logistic regression.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Computer Simulation
  • Humans
  • Logistic Models*
  • Multilevel Analysis
  • Odds Ratio