Bayesian record linkage with variables in one file

Stat Med. 2023 Nov 30;42(27):4931-4951. doi: 10.1002/sim.9894. Epub 2023 Aug 31.

Abstract

In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that the proposed method can improve the linking process, and can result in accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare enrollment records.

Keywords: Bayesian; mixture models; multiple imputation; record linkage.

MeSH terms

  • Aged
  • Algorithms
  • Bayes Theorem
  • Humans
  • Medical Record Linkage* / methods
  • Medicare*
  • United States