The validity of causal claims with repeated measures designs: A within-study comparison evaluation of differences-in-differences and the comparative interrupted time series

Kylie L Anglin; Vivian C Wong; Coady Wing; Kate Miller-Bains; Kevin McConeghy

doi:10.1177/0193841X231167672

The validity of causal claims with repeated measures designs: A within-study comparison evaluation of differences-in-differences and the comparative interrupted time series

Eval Rev. 2023 Oct;47(5):895-931. doi: 10.1177/0193841X231167672. Epub 2023 Apr 18.

Authors

Kylie L Anglin¹, Vivian C Wong², Coady Wing³, Kate Miller-Bains², Kevin McConeghy⁴

Affiliations

¹ Neag School of Education, University of Connecticut, Storrs, CT, USA.
² School of Education and Human Development, University of Virginia, Charlottesville, VA, USA.
³ Paul H. O'Neill School of Public and Environmental Affairs, Indiana University, Bloomington, IN, USA.
⁴ School of Public Health, Brown University, Providence, RI, USA.

PMID: 37072684
DOI: 10.1177/0193841X231167672

Abstract

Modern policies are commonly evaluated not with randomized experiments but with repeated measures designs like difference-in-differences (DID) and the comparative interrupted time series (CITS). The key benefit of these designs is that they control for unobserved confounders that are fixed over time. However, DID and CITS designs only result in unbiased impact estimates when the model assumptions are consistent with the data at hand. In this paper, we empirically test whether the assumptions of repeated measures designs are met in field settings. Using a within-study comparison design, we compare experimental estimates of the impact of patient-directed care on medical expenditures to non-experimental DID and CITS estimates for the same target population and outcome. Our data come from a multi-site experiment that includes participants receiving Medicaid in Arkansas, Florida, and New Jersey. We present summary measures of repeated measures bias across three states, four comparison groups, two model specifications, and two outcomes. We find that, on average, bias resulting from repeated measures designs are very close to zero (less than 0.01 standard deviations; SDs). Further, we find that comparison groups which have pre-treatment trends that are visibly parallel to the treatment group result in less bias than those with visibly divergent trends. However, CITS models that control for baseline trends produced slightly more bias and were less precise than DID models that only control for baseline means. Overall, we offer optimistic evidence in favor of repeated measures designs when randomization is not feasible.

Keywords: Within-study comparison; causal inference; design replication; quasi-experimental design.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Arkansas
Causality
Florida
Humans
Interrupted Time Series Analysis
Research Design*
United States