Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics

Comput Methods Programs Biomed. 2008 Sep;91(3):208-22. doi: 10.1016/j.cmpb.2008.04.001. Epub 2008 May 20.

Abstract

This paper is concerned with the challenge of enabling the use of confidential or private data for research and policy analysis, while protecting confidentiality and privacy by reducing the risk of disclosure of sensitive information. Traditional solutions to the problem of reducing disclosure risk include releasing de-identified data and modifying data before release. In this paper we discuss the alternative approach of using a remote analysis server which does not enable any data release, but instead is designed to deliver useful results of user-specified statistical analyses with a low risk of disclosure. The techniques described in this paper enable a user to conduct a wide range of methods in exploratory data analysis, regression and survival analysis, while at the same time reducing the risk that the user can read or infer any individual record attribute value. We illustrate our methods with examples from biostatistics using publicly available data. We have implemented our techniques into a software demonstrator called Privacy-Preserving Analytics (PPA), via a web-based interface to the R software. We believe that PPA may provide an effective balance between the competing goals of providing useful information and reducing disclosure risk in some situations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Australia
  • Computer Security*
  • Computer Simulation
  • Confidentiality*
  • Data Interpretation, Statistical*
  • Database Management Systems*
  • Information Storage and Retrieval / methods*
  • Internet
  • Models, Biological*
  • Models, Statistical*
  • Software*