Record linkage without patient identifiers: proof of concept using data from South Africa's national HIV program

Res Sq [Preprint]. 2023 May 15:rs.3.rs-2893943. doi: 10.21203/rs.3.rs-2893943/v1.

Abstract

Background: Linkage between health databases typically requires identifiers such as patient names and personal identification numbers. We developed and validated a record linkage strategy to combine administrative health databases without the use of patient identifiers, with application to South Africa's public sector HIV treatment program.

Methods: We linked CD4 counts and HIV viral loads from South Africa's HIV clinical monitoring database (TIER.Net) and the National Health Laboratory Service (NHLS) for patients receiving care between 2015-2019 in Ekurhuleni District (Gauteng Province). We used a combination of variables related to lab results contained in both databases (result value; specimen collection date; facility of collection; patient year and month of birth; and sex). Exact matching linked on exact linking variable values while caliper matching applied exact matching with linkage on approximate test dates (± 5 days). We then developed a sequential linkage approach utilising specimen barcode matching, then exact matching, and lastly caliper matching. Performance measures were sensitivity and positive predictive value (PPV); share of patients linked across databases; and percent increase in data points for each linkage approach.

Results: We attempted to link 2,017,290 lab results from TIER.Net (representing 523,558 unique patients) and 2,414,059 lab results from the NHLS database. Linkage performance was evaluated using specimen barcodes (available for a minority of records in TIER.net) as a "gold standard". Exact matching achieved a sensitivity of 69.0% and PPV of 95.1%. Caliper-matching achieved a sensitivity of 75.7% and PPV of 94.5%. In sequential linkage, we matched 41.9% of TIER.Net labs by specimen barcodes, 51.3% by exact matching, and 6.8% by caliper matching, for a total of 71.9% of labs matched, with PPV=96.8% and Sensitivity = 85.9%. The sequential approach linked 86.0% of TIER.Net patients with at least one lab result to the NHLS database (N=1,450,087). Linkage to the NHLS Cohort increased the number of laboratory results associated with TIER.Net patients by 62.6%.

Conclusions: Linkage of TIER.Net and NHLS without patient identifiers attained high accuracy and yield without compromising patient privacy. The integrated cohort provides a more complete view of patients' lab history and could yield more accurate estimates of HIV program indicators.

Keywords: CD4 count; HIV; Record linkage; South Africa; Viral load.

Publication types

  • Preprint

Grants and funding

This work was supported by the National Institutes of Health [1 R01 AI 152149-01A1]. The paper’s contents are the responsibility of the authors and do not necessarily reflect the views of the funders. The funders had no role in the study design, collection, analysis and interpretation of the data, in manuscript preparation, or in the decision to publish.