Improved generalized raking estimators to address dependent covariate and failure-time outcome error

Eric J Oh; Bryan E Shepherd; Thomas Lumley; Pamela A Shaw

doi:10.1002/bimj.202000187

Improved generalized raking estimators to address dependent covariate and failure-time outcome error

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

Authors

Eric J Oh¹, Bryan E Shepherd², Thomas Lumley³, Pamela A Shaw¹

Affiliations

¹ Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
² Department of Biostatistics, Vanderbilt University, Nashville, TN, USA.
³ Department of Statistics, University of Auckland, Auckland, New Zealand.

Abstract

Biomedical studies that use electronic health records (EHR) data for inference are often subject to bias due to measurement error. The measurement error present in EHR data is typically complex, consisting of errors of unknown functional form in covariates and the outcome, which can be dependent. To address the bias resulting from such errors, generalized raking has recently been proposed as a robust method that yields consistent estimates without the need to model the error structure. We provide rationale for why these previously proposed raking estimators can be expected to be inefficient in failure-time outcome settings involving misclassification of the event indicator. We propose raking estimators that utilize multiple imputation, to impute either the target variables or auxiliary variables, to improve the efficiency. We also consider outcome-dependent sampling designs and investigate their impact on the efficiency of the raking estimators, either with or without multiple imputation. We present an extensive numerical study to examine the performance of the proposed estimators across various measurement error settings. We then apply the proposed methods to our motivating setting, in which we seek to analyze HIV outcomes in an observational cohort with EHR data from the Vanderbilt Comprehensive Care Clinic.

Keywords: electronic health records; generalized raking; measurement error; misclassification; survival analysis.

Improved generalized raking estimators to address dependent covariate and failure-time outcome error

Authors

Affiliations

Abstract

Publication types

MeSH terms

Grants and funding