Using game theory to thwart multistage privacy intrusions when sharing data

Zhiyu Wan; Yevgeniy Vorobeychik; Weiyi Xia; Yongtai Liu; Myrna Wooders; Jia Guo; Zhijun Yin; Ellen Wright Clayton; Murat Kantarcioglu; Bradley A Malin

doi:10.1126/sciadv.abe9986

Using game theory to thwart multistage privacy intrusions when sharing data

Sci Adv. 2021 Dec 10;7(50):eabe9986. doi: 10.1126/sciadv.abe9986. Epub 2021 Dec 10.

Authors

Zhiyu Wan^{1

2}, Yevgeniy Vorobeychik³, Weiyi Xia², Yongtai Liu¹, Myrna Wooders⁴, Jia Guo¹, Zhijun Yin^{1

2}, Ellen Wright Clayton^{5

6

7}, Murat Kantarcioglu^{8

9

10}, Bradley A Malin^{1

2

11}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA.
² Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
³ Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA.
⁴ Department of Economics, Vanderbilt University, Nashville, TN 37235, USA.
⁵ Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN 37203, USA.
⁶ School of Law, Vanderbilt University, Nashville, TN 37203, USA.
⁷ Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
⁸ Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA.
⁹ Institute for Quantitative Social Science, Harvard University, Cambridge, MA 02138, USA.
¹⁰ Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA.
¹¹ Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA.

Abstract

Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack’s success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.

Abstract

Grants and funding