Privacy-preserving record linkage in large databases using secure multiparty computation

BMC Med Genomics. 2018 Oct 11;11(Suppl 4):84. doi: 10.1186/s12920-018-0400-8.

Abstract

Background: Practical applications for data analysis may require combining multiple databases belonging to different owners, such as health centers. The analysis should be performed without violating privacy of neither the centers themselves, nor the patients whose records these centers store. To avoid biased analysis results, it may be important to remove duplicate records among the centers, so that each patient's data would be taken into account only once. This task is very closely related to privacy-preserving record linkage.

Methods: This paper presents a solution to privacy-preserving deduplication among records of several databases using secure multiparty computation. It is build upon one of the fastest practical secure multiparty computation platforms, called Sharemind.

Results: The tests on ca 10 million records of simulated databases with 1000 health centers of 10000 records each show that the computation is feasible in practice. The expected running time of the experiment is ca. 30 min for computing servers connected over 100 Mbit/s WAN, the expected error of the results is 2-40, and no errors have been detected for the particular test set that we used for our benchmarks.

Conclusions: The solution is ready for practical use. It has well-defined security properties, implied by the properties of Sharemind platform. The solution assumes that exact matching of records is required, and a possible future research would be extending it to approximate matching.

Keywords: Deduplication; Privacy; Privacy-preserving record linkage; Secure multiparty computation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Security*
  • Databases, Factual*