Implementation and evaluation of an efficient secure computation system using 'R' for healthcare statistics

J Am Med Inform Assoc. 2014 Oct;21(e2):e326-31. doi: 10.1136/amiajnl-2014-002631. Epub 2014 Apr 24.

Abstract

Background and objective: While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures.

Materials and methods: Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software 'R' by effectively combining secret-sharing-based secure computation with original computation.

Results: Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50,000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes.

Discussion: If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using 'R' that works interactively while secure computation protocols generally require a significant amount of processing time.

Conclusions: We propose a secure statistical analysis system using 'R' for medical data that effectively integrates secret-sharing-based secure computation and original computation.

Keywords: healthcare statistics; insurance database; privacy preserving data mining (PPDM); secret sharing scheme; secure computation.

MeSH terms

  • Computer Security*
  • Computer Systems
  • Delivery of Health Care / statistics & numerical data
  • Electronic Health Records*
  • Statistics as Topic*