Causal Inference with Multilevel Data: A Comparison of Different Propensity Score Weighting Approaches

Multivariate Behav Res. 2022 Nov-Dec;57(6):916-939. doi: 10.1080/00273171.2021.1925521. Epub 2021 Jun 15.

Abstract

Propensity score methods are a widely recommended approach to adjust for confounding and to recover treatment effects with non-experimental, single-level data. This article reviews propensity score weighting estimators for multilevel data in which individuals (level 1) are nested in clusters (level 2) and nonrandomly assigned to either a treatment or control condition at level 1. We address the choice of a weighting strategy (inverse probability weights, trimming, overlap weights, calibration weights) and discuss key issues related to the specification of the propensity score model (fixed-effects model, multilevel random-effects model) in the context of multilevel data. In three simulation studies, we show that estimates based on calibration weights, which prioritize balancing the sample distribution of level-1 and (unmeasured) level-2 covariates, should be preferred under many scenarios (i.e., treatment effect heterogeneity, presence of strong level-2 confounding) and can accommodate covariate-by-cluster interactions. However, when level-1 covariate effects vary strongly across clusters (i.e., under random slopes), and this variation is present in both the treatment and outcome data-generating mechanisms, large cluster sizes are needed to obtain accurate estimates of the treatment effect. We also discuss the implementation of survey weights and present a real-data example that illustrates the different methods.

Keywords: Causal inference; calibration weights; multilevel data; propensity scores; weighting.

Publication types

  • Review

MeSH terms

  • Causality
  • Computer Simulation
  • Humans
  • Propensity Score*
  • Surveys and Questionnaires