Accurate record linkage depends on the availability and quality of features such as first name and last name. Privacy preserving record linkage methods using tokenization is sensitive to perturbations in the patient features used as inputs. In this study we evaluated the impact of name transformations on the accuracy of patient matching using a large commercial dataset. We used a set of 68 million records representing 59 million unique individuals, and implemented and evaluated eight name transformation strategies, and generated precision, recall and F1 scores. Transforming names to include the most common nicknames resulted in a significant gain in recall while maintaining precision, and generated the highest F1 score compared with no name transformation (0.905 vs 0.807). Strategies tailored to transforming patient features can improve the precision and recall of patient matching, and make it possible to create high quality, linked datasets for research purposes.
©2022 AMIA - All rights reserved.