Efficient Protection of Health Data from Sensitive Attribute Disclosure

Stud Health Technol Inform. 2020 Jun 16:270:193-197. doi: 10.3233/SHTI200149.

Abstract

Biomedical research has become data-driven. To create the required big datasets, health data needs to be shared or reused out of the context of its initial purpose. This leads to significant privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and robust tools. In this work, we focus on the problem of scalability. Protecting biomedical data from inference attacks is challenging, in particular for numeric data. An important privacy model in this context is t-closeness, which has also been defined for attribute values which are totally ordered. However, directly implementing a scalable algorithmic representation of the mathematical definition of the model proves difficult. In this paper we therefore present a series of optimizations that can be used to achieve efficiency in production use. An experimental evaluation shows that our approach reduces execution times of anonymization processes involving t-closeness by up to a factor of two.

Keywords: anonymization; data protection; inference attacks; scalability.

MeSH terms

  • Biomedical Research
  • Data Anonymization
  • Disclosure*
  • Privacy