Efficient Protection of Health Data from Sensitive Attribute Disclosure

Raffael Bild; Johanna Eicher; Fabian Prasser

doi:10.3233/SHTI200149

Efficient Protection of Health Data from Sensitive Attribute Disclosure

Stud Health Technol Inform. 2020 Jun 16:270:193-197. doi: 10.3233/SHTI200149.

Authors

Raffael Bild¹, Johanna Eicher¹, Fabian Prasser^{2

3}

Affiliations

¹ University hospital rechts der Isar, Technical University of Munich, Germany.
² Charité - Universitätsmedizin Berlin, Berlin, Germany.
³ Berlin Institute of Health (BIH), Berlin, Germany.

PMID: 32570373
DOI: 10.3233/SHTI200149

Abstract

Biomedical research has become data-driven. To create the required big datasets, health data needs to be shared or reused out of the context of its initial purpose. This leads to significant privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and robust tools. In this work, we focus on the problem of scalability. Protecting biomedical data from inference attacks is challenging, in particular for numeric data. An important privacy model in this context is t-closeness, which has also been defined for attribute values which are totally ordered. However, directly implementing a scalable algorithmic representation of the mathematical definition of the model proves difficult. In this paper we therefore present a series of optimizations that can be used to achieve efficiency in production use. An experimental evaluation shows that our approach reduces execution times of anonymization processes involving t-closeness by up to a factor of two.

Keywords: anonymization; data protection; inference attacks; scalability.

MeSH terms

Biomedical Research
Data Anonymization
Disclosure*
Privacy