A comparison of accuracy and computational feasibility of two record linkage algorithms in retrieving vital status information from HIV/AIDS patients registered in Brazilian public databases

Int J Med Inform. 2018 Jun:114:45-51. doi: 10.1016/j.ijmedinf.2018.03.005. Epub 2018 Mar 20.

Abstract

Background and objective: While cross-referencing information from people living with HIV/AIDS (PLWHA) to the official mortality database is a critical step in monitoring the HIV/AIDS epidemic in Brazil, the accuracy of the linkage routine may compromise the validity of the final database, yielding to biased epidemiological estimates. We compared the accuracy and the total runtime of two linkage algorithms applied to retrieve vital status information from PLWHA in Brazilian public databases.

Methods: Nominally identified records from PLWHA were obtained from three distinct government databases. Linkage routines included an algorithm in Python language (PLA) and Reclink software (RlS), a probabilistic software largely utilized in Brazil. Records from PLWHA1 known to be alive were added to those from patients reported as deceased. Data were then searched into the mortality system. Scenarios where 5% and 50% of patients actually dead were simulated, considering both complete cases and 20% missing maternal names.

Results: When complete information was available both algorithms had comparable accuracies. In the scenario of 20% missing maternal names, PLA2 and RlS3 had sensitivities of 94.5% and 94.6% (p > 0.5), respectively; after manual reviewing, PLA sensitivity increased to 98.4% (96.6-100.0) exceeding that for RlS (p < 0.01). PLA had higher positive predictive value in 5% death proportion. Manual reviewing was intrinsically required by RlS in up to 14% register for people actually dead, whereas the corresponding proportion ranged from 1.5% to 2% for PLA. The lack of manual inspection did not alter PLA sensitivity when complete information was available. When incomplete data was available PLA sensitivity increased from 94.5% to 98.4%, thus exceeding that presented by RlS (94.6%, p < 0.05). RlS spanned considerably less processing time compared to PLA.

Conclusion: Both linkage algorithms presented interchangeable accuracies in retrieving vital status data from PLWHA. RlS had a considerably lesser runtime but intrinsically required manually reviewing a fastidious proportion of the matched registries. On the other hand, PLA spent quite more runtime but spared manual reviewing at no expense of accuracy.

Keywords: Deterministic linkage; HIV; Mortality; Probabilistic linkage; Public datasets; Record linkage.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acquired Immunodeficiency Syndrome / epidemiology
  • Acquired Immunodeficiency Syndrome / mortality*
  • Algorithms*
  • Brazil / epidemiology
  • Databases, Factual / standards*
  • Databases, Factual / statistics & numerical data
  • Electronic Health Records / standards*
  • Feasibility Studies
  • HIV / isolation & purification*
  • Humans
  • Medical Record Linkage / methods*
  • Software