Accessibility of covariance information creates vulnerability in Federated Learning frameworks

Manuel Huth; Jonas Arruda; Roy Gusinow; Lorenzo Contento; Evelina Tacconelli; Jan Hasenauer

doi:10.1093/bioinformatics/btad531

Accessibility of covariance information creates vulnerability in Federated Learning frameworks

Bioinformatics. 2023 Sep 2;39(9):btad531. doi: 10.1093/bioinformatics/btad531.

Authors

Manuel Huth^{1

2}, Jonas Arruda², Roy Gusinow^{1

2}, Lorenzo Contento², Evelina Tacconelli³, Jan Hasenauer²

Affiliations

¹ Institute of Computational Biology, Helmholtz Munich, Neuherberg 85764, Germany.
² Life and Medical Sciences Institute, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn 53115, Germany.
³ Division of Infectious Diseases, Department of Diagnostics and Public Health, University of Verona, Verona 37124, Italy.

Abstract

Motivation: Federated Learning (FL) is gaining traction in various fields as it enables integrative data analysis without sharing sensitive data, such as in healthcare. However, the risk of data leakage caused by malicious attacks must be considered. In this study, we introduce a novel attack algorithm that relies on being able to compute sample means, sample covariances, and construct known linearly independent vectors on the data owner side.

Results: We show that these basic functionalities, which are available in several established FL frameworks, are sufficient to reconstruct privacy-protected data. Additionally, the attack algorithm is robust to defense strategies that involve adding random noise. We demonstrate the limitations of existing frameworks and propose potential defense strategies analyzing the implications of using differential privacy. The novel insights presented in this study will aid in the improvement of FL frameworks.

Availability and implementation: The code examples are provided at GitHub (https://github.com/manuhuth/Data-Leakage-From-Covariances.git). The CNSIM1 dataset, which we used in the manuscript, is available within the DSData R package (https://github.com/datashield/DSData/tree/main/data).

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Data Analysis*
Privacy