Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score-based confounding adjustments

Elisabetta Patorno; Robert J Glynn; Sonia Hernández-Díaz; Jun Liu; Sebastian Schneeweiss

doi:10.1097/EDE.0000000000000069

Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score-based confounding adjustments

Epidemiology. 2014 Mar;25(2):268-78. doi: 10.1097/EDE.0000000000000069.

Authors

Elisabetta Patorno¹, Robert J Glynn, Sonia Hernández-Díaz, Jun Liu, Sebastian Schneeweiss

Affiliation

¹ From the aDivision of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA; and bDepartment of Epidemiology, Harvard School of Public Health, Boston, MA.

PMID: 24487209
DOI: 10.1097/EDE.0000000000000069

Abstract

Background: Propensity scores are useful for confounding adjustment in the commonly observed setting of many potential confounders, frequent exposure, and rare events. However, with few exposed outcomes to inform covariate selection and many candidate confounders, optimal approaches to construct and implement propensity-score-based confounding adjustment remain unclear.

Methods: In a cohort study on the effect of anticonvulsant drugs on cardiovascular risk among adult patients from the HealthCore Integrated Research Database, we compared the performance for confounding control of various covariate-selection strategies for propensity-score estimation (expert knowledge only, expert knowledge informed by empirical covariate selection via high-dimensional propensity-score, and high-dimensional propensity-score empirical specification only) and propensity-score-based adjustment methods (propensity-score-matching and propensity-score-decile stratification). This article focuses on the first 90 days of follow-up because any treatment effect identified in this temporal window almost certainty originates from residual confounding rather than pharmacologic action.

Results: We identified 166,031 new users and 564 ischemic cardiovascular events. Among those, 12,580 patients initiated anticonvulsants that strongly induce cytochrome P450 enzymes and experienced 68 events. The unadjusted hazard ratio was 1.72 (95% confidence interval = 1.34-2.22). Adjustment for investigator-identified covariates led to 41% to 59% reductions in the hazard ratio; adjustment for both investigator-identified and high-dimensional propensity-score empirically identified covariates led to larger reductions (54% to 72%). A selection strategy based on high-dimensional propensity-score empirical specification alone produced less-attenuated and more-volatile hazard ratio estimates. This volatility seemed to be slightly attenuated in a trimmed propensity-score-stratified analysis.

Conclusions: The high-dimensional propensity-score algorithm complements expert knowledge for confounding adjustment, but in settings with few exposed outcomes, its performance without investigator-specified covariates is less clear and may be associated with an increased likelihood of bias. In our example, investigator specification of variables combined with high-dimensional propensity-score empirical selection and the use of trimmed propensity-score-stratified analysis seem to improve effect estimation. Plotting the relation of effect estimates to the increasing number of empirical covariates is a useful diagnostic.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms
Anticonvulsants / adverse effects*
Cardiovascular Diseases / chemically induced*
Confounding Factors, Epidemiologic*
Epidemiologic Research Design*
Female
Follow-Up Studies
Humans
Male
Middle Aged
Outcome Assessment, Health Care
Propensity Score*
Risk Factors

Substances

Anticonvulsants