Encoding of Numerical Data for Privacy-Preserving Record Linkage

Stud Health Technol Inform. 2020 Jun 23:271:23-30. doi: 10.3233/SHTI200070.

Abstract

Background: Privacy-preserving record linkage (PPRL) is the process of detecting dataset entries that refer to the same individual within two independent datasets, without disclosing any personal information. While applied in different fields, it particularly attained importance in the medical sector. One popular PPRL method are Bloom filters. However, Bloom filters were originally used for encoding strings only.

Objectives: This paper evaluates an encoding method specifically designed for numerical data and adjusts it for encoding geocoordinates in Bloom filters.

Methods: The proposed numerical encoding of geocoordinates is compared to the string-based method by using synthetic data.

Results: The proposed method for encoding geocoordinates in Bloom filters attains a higher recall and precision than the conventional string encoding.

Conclusion: Numerical encoding has the potential of increasing the record linkage quality of Bloom filters, as well as their privacy level.

Keywords: algorithms; computerized; confidentiality; medical record linkage; medical records system; personally identifiable information; privacy.

MeSH terms

  • Computer Security
  • Confidentiality
  • Medical Record Linkage
  • Medical Records Systems, Computerized
  • Names
  • Privacy*
  • Records