Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

Robert B Olsen; Larry L Orr; Stephen H Bell; Elizabeth Petraglia; Elena Badillo-Goicoechea; Atsushi Miyaoka; Elizabeth A Stuart

doi:10.1080/19345747.2023.2180464

Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?

J Res Educ Eff. 2024;17(1):184-210. doi: 10.1080/19345747.2023.2180464. Epub 2023 Apr 13.

Authors

Robert B Olsen¹, Larry L Orr², Stephen H Bell³, Elizabeth Petraglia⁴, Elena Badillo-Goicoechea⁵, Atsushi Miyaoka⁴, Elizabeth A Stuart⁶

Affiliations

¹ George Washington Institute of Public Policy, The George Washington University, Washington, DC 20052.
² Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Chevy, Chase, MD 20815.
³ Bell Eval LLC, Kensington, Maryland, USA.
⁴ Westat, Rockville, MD 20850.
⁵ Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205.
⁶ Departments of Mental Health, Biostatistics, and Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205.

PMID: 38450254
PMCID: PMC10914338 (available on 2025-01-01)
DOI: 10.1080/19345747.2023.2180464

Abstract

Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods-lasso regression and Bayesian Additive Regression Trees (BART)-using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced "less inaccurate" predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modelling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.

Keywords: Randomized controlled trials; evidence-based policy; external validity; generalizability; transportability.

Grants and funding

P50 MH115842/MH/NIMH NIH HHS/United States