A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis

Jaideep Vaidya; Basit Shafiq; Muazzam Asani; Nabil Adam; Xiaoqian Jiang; Lucila Ohno-Machado

A Scalable Privacy-preserving Data Generation Methodology for Exploratory Analysis

AMIA Annu Symp Proc. 2018 Apr 16:2017:1695-1704. eCollection 2017.

Authors

Jaideep Vaidya¹, Basit Shafiq², Muazzam Asani², Nabil Adam¹, Xiaoqian Jiang³, Lucila Ohno-Machado³

Affiliations

¹ Rutgers University, Newark, NJ, USA.
² Lahore University of Management Sciences, Lahore, Punjab, Pakistan.
³ University of California at San Diego, La Jolla, CA, USA.

PMID: 29854240
PMCID: PMC5977652

Abstract

Big data coupled with precision medicine has the potential to significantly improve our understanding and treatment of complex disorders, such as cancer, diabetes, depression, etc. However, the essential problem is that data are stuck in silos, and it is difficult to precisely identify which data would be relevant and useful for any particular type of analysis. While the process to acquire and access biomedical data requires significant effort, in many cases the data may not provide much insight to the problem at hand. Therefore, there is a need to be able to measure the utility/relevance of additional datasets for a particular biomedical research task without direct access to the data. Towards this, in this paper, we develop a privacy-preserving approach to create synthetic data that can provide a firstorder approximation of utility. We evaluate the proposed approach with several biomedical datasets in the context of regression and classification tasks and discuss how it can be incorporated into existing data management systems such as REDCap.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Big Data
Biomedical Research*
Computer Security*
Datasets as Topic*
Humans
Privacy*

Abstract

Publication types

MeSH terms

Grants and funding