Implementation and validation of a probabilistic linkage method for population databases without identification variables

Heliyon. 2022 Dec 14;8(12):e12311. doi: 10.1016/j.heliyon.2022.e12311. eCollection 2022 Dec.

Abstract

Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Fellegi-Sunter probabilistic linkage method to a sample of records from the Mexican Automated System for Hospital Discharges and the Statistical and Epidemiological System for Deaths and evaluated its performance. The records in each source were randomly divided into a training sample (25%) and a validation sample (75%). We evaluated different types of blocking in terms of complexity reduction and pairs completeness, and record linkage in terms of sensitivity and positive predictive value. In the validation sample, a blocking scheme based on trigrams of the full name achieved 95.76% pairs completeness and 99.9996% complexity reduction. After pairs classification, we achieved a sensitivity of 90.72% and a positive predictive value of 97.10% in the validation sample. Both values were about one percentage point higher than that obtained in the automatic classification without clerical review of potential pairs. We concluded that the linkage algorithm achieved a good performance in terms of sensitivity and positive predictive value and can be used to build administrative cohorts for the epidemiological analysis of populations with records in health information systems.

Keywords: Algorithm; Blocking; Hospital discharge; Information systems; Mortality; Probability; Record linkage.