A Cautionary Note on Using Propensity Score Calibration to Control for Unmeasured Confounding Bias When the Surrogacy Assumption Is Absent

Am J Epidemiol. 2024 Feb 5;193(2):360-369. doi: 10.1093/aje/kwad189.

Abstract

Conventional propensity score methods encounter challenges when unmeasured confounding is present, as it becomes impossible to accurately estimate the gold-standard propensity score when data on certain confounders are unavailable. Propensity score calibration (PSC) addresses this issue by constructing a surrogate for the gold-standard propensity score under the surrogacy assumption. This assumption posits that the error-prone propensity score, based on observed confounders, is independent of the outcome when conditioned on the gold-standard propensity score and the exposure. However, this assumption implies that confounders cannot directly impact the outcome and that their effects on the outcome are solely mediated through the propensity score. This raises concerns regarding the applicability of PSC in practical settings where confounders can directly affect the outcome. While PSC aims to target a conditional treatment effect by conditioning on a subject's unobservable propensity score, the causal interest in the latter case lies in a conditional treatment effect conditioned on a subject's baseline characteristics. Our analysis reveals that PSC is generally biased unless the effects of confounders on the outcome and treatment are proportional to each other. Furthermore, we identify 2 sources of bias: 1) the noncollapsibility of effect measures, such as the odds ratio or hazard ratio and 2) residual confounding, as the calibrated propensity score may not possess the properties of a valid propensity score.

Keywords: confounding bias; linear models; noncollapsibility; nonlinear models; propensity score calibration.

MeSH terms

  • Bias
  • Calibration*
  • Confounding Factors, Epidemiologic
  • Humans
  • Propensity Score
  • Proportional Hazards Models