Preserving Patient Privacy When Sharing Same-Disease Data

Xiaoping Liu; Xiao-Bai Li; Luvai Motiwalla; Wenjun Li; Hua Zheng; Patricia D Franklin

doi:10.1145/2956554

Preserving Patient Privacy When Sharing Same-Disease Data

ACM J Data Inf Qual. 2016 Oct;7(4):17. doi: 10.1145/2956554.

Authors

Xiaoping Liu¹, Xiao-Bai Li², Luvai Motiwalla³, Wenjun Li⁴, Hua Zheng⁵, Patricia D Franklin⁶

Affiliations

¹ Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, MA 01854; Xiaoping_Liu@student.uml.edu.
² Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, MA 01854.
³ Department of Operations and Information Systems, University of Massachusetts Lowell, Lowell, MA 01854; Luvai_Motiwalla@uml.edu.
⁴ Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655; Wenjun.Li@umassmed.edu.
⁵ Department of Orthopedics and Physical Rehabilitation, University of Massachusetts Medical School, Worcester, MA 01655; Hua.Zheng@umassmed.edu.
⁶ Department of Orthopedics and Physical Rehabilitation, University of Massachusetts Medical School, Worcester, MA 01655; Patricia.Franklin@umassmed.edu .

PMID: 27867450
PMCID: PMC5111902
DOI: 10.1145/2956554

Abstract

Medical and health data are often collected for studying a specific disease. For such same-disease microdata, a privacy disclosure occurs as long as an individual is known to be in the microdata. Individuals in same-disease microdata are thus subject to higher disclosure risk than those in microdata with different diseases. This important problem has been overlooked in data-privacy research and practice, and no prior study has addressed this problem. In this study, we analyze the disclosure risk for the individuals in same-disease microdata and propose a new metric that is appropriate for measuring disclosure risk in this situation. An efficient algorithm is designed and implemented for anonymizing same-disease data to minimize the disclosure risk while keeping data utility as good as possible. An experimental study was conducted on real patient and population data. Experimental results show that traditional reidentification risk measures underestimate the actual disclosure risk for the individuals in same-disease microdata and demonstrate that the proposed approach is very effective in reducing the actual risk for same-disease data. This study suggests that privacy protection policy and practice for sharing medical and health data should consider not only the individuals' identifying attributes but also the health and disease information contained in the data. It is recommended that data-sharing entities employ a statistical approach, instead of the HIPAA's Safe Harbor policy, when sharing same-disease microdata.

Keywords: Data sharing; HIPAA; disclosure risk.

Grants and funding

R01 LM010942/LM/NLM NIH HHS/United States