Deep Learning-based Propensity Scores for Confounding Control in Comparative Effectiveness Research: A Large-scale, Real-world Data Study

Janick Weberpals; Tim Becker; Jessica Davies; Fabian Schmich; Dominik Rüttinger; Fabian J Theis; Anna Bauer-Mehren

doi:10.1097/EDE.0000000000001338

Deep Learning-based Propensity Scores for Confounding Control in Comparative Effectiveness Research: A Large-scale, Real-world Data Study

Epidemiology. 2021 May 1;32(3):378-388. doi: 10.1097/EDE.0000000000001338.

Authors

Janick Weberpals¹, Tim Becker², Jessica Davies³, Fabian Schmich¹, Dominik Rüttinger⁴, Fabian J Theis^{5

6}, Anna Bauer-Mehren¹

Affiliations

¹ From the Data Science, Pharmaceutical Research and Early Development Informatics (pREDi), Roche Innovation Center Munich (RICM), Penzberg, Germany.
² xValue GmbH, Willich, Germany, on behalf of Data Science IV, Pharmaceutical Research and Early Development Informatics (pREDi), Roche Innovation Center Munich (RICM), Penzberg, Germany.
³ F. Hoffmann-La Roche Ltd, Welwyn Garden City, United Kingdom.
⁴ Early Clinical Development Oncology, Pharmaceutical Research and Early Development (pRED), Roche Innovation Center Munich (RICM), Penzberg, Germany.
⁵ Institute of Computational Biology, German Research Center for Environmental Health, Helmholtz Center Munich, Neuherberg, Germany.
⁶ Department of Mathematics, Technical University of Munich, Garching, Germany.

PMID: 33591049
DOI: 10.1097/EDE.0000000000001338

Abstract

Background: Due to the non-randomized nature of real-world data, prognostic factors need to be balanced, which is often done by propensity scores (PSs). This study aimed to investigate whether autoencoders, which are unsupervised deep learning architectures, might be leveraged to compute PS.

Methods: We selected patient-level data of 128,368 first-line treated cancer patients from the Flatiron Health EHR-derived de-identified database. We trained an autoencoder architecture to learn a lower-dimensional patient representation, which we used to compute PS. To compare the performance of an autoencoder-based PS with established methods, we performed a simulation study. We assessed the balancing and adjustment performance using standardized mean differences, root mean square errors (RMSE), percent bias, and confidence interval coverage. To illustrate the application of the autoencoder-based PS, we emulated the PRONOUNCE trial by applying the trial's protocol elements within an observational database setting, comparing two chemotherapy regimens.

Results: All methods but the manual variable selection approach led to well-balanced cohorts with average standardized mean differences <0.1. LASSO yielded on average the lowest deviation of resulting estimates (RMSE 0.0205) followed by the autoencoder approach (RMSE 0.0248). Altering the hyperparameter setup in sensitivity analysis, the autoencoder approach led to similar results as LASSO (RMSE 0.0203 and 0.0205, respectively). In the case study, all methods provided a similar conclusion with point estimates clustered around the null (e.g., HRautoencoder 1.01 [95% confidence interval = 0.80, 1.27] vs. HRPRONOUNCE 1.07 [0.83, 1.36]).

Conclusions: Autoencoder-based PS computation was a feasible approach to control for confounding but did not perform better than some established approaches like LASSO.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Comparative Effectiveness Research*
Computer Simulation
Databases, Factual
Deep Learning*
Humans
Propensity Score