The Impact of Name Transformation on Match Rates Within a Large Consumer Database

AMIA Annu Symp Proc. 2023 Apr 29:2022:692-699. eCollection 2022.

Abstract

Accurate record linkage depends on the availability and quality of features such as first name and last name. Privacy preserving record linkage methods using tokenization is sensitive to perturbations in the patient features used as inputs. In this study we evaluated the impact of name transformations on the accuracy of patient matching using a large commercial dataset. We used a set of 68 million records representing 59 million unique individuals, and implemented and evaluated eight name transformation strategies, and generated precision, recall and F1 scores. Transforming names to include the most common nicknames resulted in a significant gain in recall while maintaining precision, and generated the highest F1 score compared with no name transformation (0.905 vs 0.807). Strategies tailored to transforming patient features can improve the precision and recall of patient matching, and make it possible to create high quality, linked datasets for research purposes.

MeSH terms

  • Data Management*
  • Databases, Factual
  • Humans
  • Medical Record Linkage / methods
  • Privacy*