Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

Scott A Malec; Peng Wei; Elmer V Bernstam; Richard D Boyce; Trevor Cohen

doi:10.1016/j.jbi.2021.103719

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

J Biomed Inform. 2021 May:117:103719. doi: 10.1016/j.jbi.2021.103719. Epub 2021 Mar 11.

Authors

Scott A Malec¹, Peng Wei², Elmer V Bernstam³, Richard D Boyce⁴, Trevor Cohen⁵

Affiliations

¹ University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States. Electronic address: sam413@pitt.edu.
² The University of Texas MD Anderson Cancer Center, Department of Biostatistics, Houston, TX, United States.
³ University of Texas Health Science Center at Houston, School of Biomedical Informatics, Houston, TX, United States.
⁴ University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States.
⁵ University of Washington, Department of Biomedical Informatics and Medical Education, Seattle, WA, United States.

Abstract

Introduction: Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data.

Methods: We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ² and reporting odds ratio) and with each other.

Results and conclusions: We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.

Keywords: Causal inference; Confounder selection; Confounding bias; Electronic health records; Pharmacovigilance.

Published by Elsevier Inc.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Bias
Causality
Models, Theoretical*
Pharmacovigilance*
Reproducibility of Results

Abstract

Publication types

MeSH terms

Grants and funding