The Anonymous Data Warehouse: A Hands-On Framework for Anonymizing Data From Digital Health Applications

Cureus. 2024 Apr 3;16(4):e57519. doi: 10.7759/cureus.57519. eCollection 2024 Apr.

Abstract

The digital health space is growing rapidly, and so is the interest in sharing anonymized health data. However, data anonymization techniques have yet to see much coverage in the medical literature. The purpose of this article is, therefore, to provide a practical framework for anonymization with a focus on the unique properties of data from digital health applications. Literature trends, as well as common anonymization techniques, were synthesized into a framework that considers the opportunities and challenges of digital health data. A rationale for each design decision is provided, and the advantages and disadvantages are discussed. We propose a framework based on storing data separately, anonymizing the data where the identified data is located, only exporting selected data, minimizing static attributes, ensuring k-anonymity of users and their static attributes, and preventing defined metrics from acting as quasi-identifiers by using aggregation, rounding, and capping. Data anonymization requires a pragmatic approach that preserves the utility of the data while minimizing reidentification risk. The proposed framework should be modified according to the characteristics of the respective data set.

Keywords: anonymization; digital health; gdpr; hipaa; wearables.

Grants and funding

A.N., E.C., and P.W. report employment at or consulting for dacadoo AG, a technology company that has built software related to the framework outlined herein. However, the content of this manuscript is not patent protected. P.W. has a patent application titled ‘Method for detection of neurological abnormalities’ outside of the submitted work. The other authors declare no additional conflict of interest.