Expanding the attack surface: Robust profiling attacks threaten the privacy of sparse behavioral data

Sci Adv. 2022 Aug 19;8(33):eabl6464. doi: 10.1126/sciadv.abl6464. Epub 2022 Aug 19.

Abstract

Behavioral data, collected from our daily interactions with technology, have driven scientific advances. Yet, the collection and sharing of this data raise legitimate privacy concerns, as individuals can often be reidentified. Current identification attacks, however, require auxiliary information to roughly match the information available in the dataset, limiting their applicability. We here propose an entropy-based profiling model to learn time-persistent profiles. Using auxiliary information about a single target collected over a nonoverlapping time period, we show that individuals are correctly identified 79% of the time in a large location dataset of 0.5 million individuals and 65.2% for a grocery shopping dataset of 85,000 individuals. We further show that accuracy only slowly decreases over time and that the model is robust to state-of-the-art noise addition. Our results show that much more auxiliary information than previously believed can be used to identify individuals, challenging deidentification practices and what currently constitutes legally anonymous data.