Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity

Carly Lupton Brantner; Ting-Hsuan Chang; Trang Quynh Nguyen; Hwanhee Hong; Leon Di Stefano; Elizabeth A Stuart

doi:10.1214/23-sts890

Methods for Integrating Trials and Non-experimental Data to Examine Treatment Effect Heterogeneity

Stat Sci. 2023 Nov;38(4):640-654. doi: 10.1214/23-sts890. Epub 2023 Nov 6.

Authors

Carly Lupton Brantner¹, Ting-Hsuan Chang², Trang Quynh Nguyen³, Hwanhee Hong⁴, Leon Di Stefano¹, Elizabeth A Stuart⁵

Affiliations

¹ Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA.
² Department of Biostatistics, Columbia Mailman School of Public Health, New York, New York 10032, USA.
³ Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA.
⁴ Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina 27710, USA.
⁵ Departments of Biostatistics, Mental Health, and Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA.

Abstract

Estimating treatment effects conditional on observed covariates can improve the ability to tailor treatments to particular individuals. Doing so effectively requires dealing with potential confounding, and also enough data to adequately estimate effect moderation. A recent influx of work has looked into estimating treatment effect heterogeneity using data from multiple randomized controlled trials and/or observational datasets. With many new methods available for assessing treatment effect heterogeneity using multiple studies, it is important to understand which methods are best used in which setting, how the methods compare to one another, and what needs to be done to continue progress in this field. This paper reviews these methods broken down by data setting: aggregate-level data, federated learning, and individual participant-level data. We define the conditional average treatment effect and discuss differences between parametric and nonparametric estimators, and we list key assumptions, both those that are required within a single study and those that are necessary for data combination. After describing existing approaches, we compare and contrast them and reveal open areas for future research. This review demonstrates that there are many possible approaches for estimating treatment effect heterogeneity through the combination of datasets, but that there is substantial work to be done to compare these methods through case studies and simulations, extend them to different settings, and refine them to account for various challenges present in real data.

Keywords: Treatment effect heterogeneity; combining data; generalizability and reproducibility.

Abstract

Grants and funding