Privacy-Preserving Database Fingerprinting

Tianxi Ji; Erman Ayday; Emre Yilmaz; Ming Li; Pan Li

doi:10.14722/ndss.2023.24693

Privacy-Preserving Database Fingerprinting

NDDS Symp. 2023 Feb-Mar:2023:10.14722/ndss.2023.24693. doi: 10.14722/ndss.2023.24693.

Authors

Tianxi Ji¹, Erman Ayday², Emre Yilmaz³, Ming Li⁴, Pan Li²

Affiliations

¹ Texas Tech University.
² Case Western Reserve University.
³ University of Houston-Downtown.
⁴ University of Texas at Arlington.

Abstract

When sharing relational databases with other parties, in addition to providing high quality (utility) database to the recipients, a database owner also aims to have (i) privacy guarantees for the data entries and (ii) liability guarantees (via fingerprinting) in case of unauthorized redistribution. However, (i) and (ii) are orthogonal objectives, because when sharing a database with multiple recipients, privacy via data sanitization requires adding noise once (and sharing the same noisy version with all recipients), whereas liability via unique fingerprint insertion requires adding different noises to each shared copy to distinguish all recipients. Although achieving (i) and (ii) together is possible in a naïve way (e.g., either differentially-private database perturbation or synthesis followed by fingerprinting), this approach results in significant degradation in the utility of shared databases. In this paper, we achieve privacy and liability guarantees simultaneously by proposing a novel entry-level differentially-private (DP) fingerprinting mechanism for relational databases without causing large utility degradation. The proposed mechanism fulfills the privacy and liability requirements by leveraging the randomization nature of fingerprinting and transforming it into provable privacy guarantees. Specifically, we devise a bit-level random response scheme to achieve differential privacy guarantee for arbitrary data entries when sharing the entire database, and then, based on this, we develop an $ϵ$ -entry-level DP fingerprinting mechanism. We theoretically analyze the connections between privacy, fingerprint robustness, and database utility by deriving closed form expressions. We also propose a sparse vector technique-based solution to control the cumulative privacy loss when fingerprinted copies of a database are shared with multiple recipients. We experimentally show that our mechanism achieves strong fingerprint robustness (e.g., the fingerprint cannot be compromised even if the malicious database recipient modifies/distorts more than half of the entries in its received fingerprinted copy), and higher database utility compared to various baseline methods (e.g., application-dependent database utility of the shared database achieved by the proposed mechanism is higher than that of the considered baselines).

Grants and funding

R01 LM013429/LM/NLM NIH HHS/United States