Vulnerability- and Diversity-Aware Anonymization of Personally Identifiable Information for Improving User Privacy and Utility of Publishing Data

Sensors (Basel). 2017 May 8;17(5):1059. doi: 10.3390/s17051059.

Abstract

Personally identifiable information (PII) affects individual privacy because PII combinations may yield unique identifications in published data. User PII such as age, race, gender, and zip code contain private information that may assist an adversary in determining the user to whom such information relates. Each item of user PII reveals identity differently, and some types of PII are highly identity vulnerable. More vulnerable types of PII enable unique identification more easily, and their presence in published data increases privacy risks. Existing privacy models treat all types of PII equally from an identity revelation point of view, and they mainly focus on hiding user PII in a crowd of other users. Ignoring the identity vulnerability of each type of PII during anonymization is not an effective method of protecting user privacy in a fine-grained manner. This paper proposes a new anonymization scheme that considers the identity vulnerability of PII to effectively protect user privacy. Data generalization is performed adaptively based on the identity vulnerability of PII as well as diversity to anonymize data. This adaptive generalization effectively enables anonymous data, which protects user identity and private information disclosures while maximizing the utility of data for performing analyses and building classification models. Additionally, the proposed scheme has low computational overheads. The simulation results show the effectiveness of the scheme and verify the aforementioned claims.

Keywords: adaptive generalization; diversity; identity vulnerability; personally identifiable information; privacy; utility.