Cox regression with linked data

Stat Med. 2024 Jan 30;43(2):296-314. doi: 10.1002/sim.9960. Epub 2023 Nov 20.

Abstract

Record linkage is increasingly used, especially in medical studies, to combine data from different databases that refer to the same entities. The linked data can bring analysts novel and valuable knowledge that is impossible to obtain from a single database. However, linkage errors are usually unavoidable, regardless of record linkage methods, and ignoring these errors may lead to biased estimates. While different methods have been developed to deal with the linkage errors in the generalized linear model, there is not much interest on Cox regression model, although this is one of the most important statistical models in clinical and epidemiological research. In this work, we propose an adjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by a third-party operator, and no information on matching variables is available to the analyst. Through a Monte Carlo simulation study, the proposed method is shown to lead to substantial bias reductions in the estimation of the parameters of the Cox model caused by false links. An asymptotically unbiased variance estimator for the adjusted estimators of Cox regression coefficients is also proposed. Finally, the proposed method is applied to a linked database from the Brest stroke registry in France.

Keywords: Cox regression; adjusted estimating equation; linkage error; secondary analysis; variance estimation.

MeSH terms

  • Bias
  • Computer Simulation
  • Data Interpretation, Statistical
  • Humans
  • Linear Models
  • Models, Statistical*
  • Regression Analysis
  • Semantic Web*