A Privacy-Preserving Distributed Analytics Platform for Health Care Data

Methods Inf Med. 2022 Jun;61(S 01):e1-e11. doi: 10.1055/s-0041-1740564. Epub 2022 Jan 17.

Abstract

Background: In recent years, data-driven medicine has gained increasing importance in terms of diagnosis, treatment, and research due to the exponential growth of health care data. However, data protection regulations prohibit data centralisation for analysis purposes because of potential privacy risks like the accidental disclosure of data to third parties. Therefore, alternative data usage policies, which comply with present privacy guidelines, are of particular interest.

Objective: We aim to enable analyses on sensitive patient data by simultaneously complying with local data protection regulations using an approach called the Personal Health Train (PHT), which is a paradigm that utilises distributed analytics (DA) methods. The main principle of the PHT is that the analytical task is brought to the data provider and the data instances remain in their original location.

Methods: In this work, we present our implementation of the PHT paradigm, which preserves the sovereignty and autonomy of the data providers and operates with a limited number of communication channels. We further conduct a DA use case on data stored in three different and distributed data providers.

Results: We show that our infrastructure enables the training of data models based on distributed data sources.

Conclusion: Our work presents the capabilities of DA infrastructures in the health care sector, which lower the regulatory obstacles of sharing patient data. We further demonstrate its ability to fuel medical science by making distributed data sets available for scientists or health care practitioners.

MeSH terms

  • Computer Security*
  • Delivery of Health Care
  • Humans
  • Information Storage and Retrieval
  • Privacy*

Grants and funding

Funding This work was supported by the German Ministry for Research and Education (BMBF) as part of the SMITH consortium (SW, LN, MJ, YUY, TK, SD, and OB, grant no. 01ZZ1803K). This work was conducted jointly by RWTH Aachen University and Fraunhofer FIT as part of the PHT and Go FAIR implementation network, which aims to develop a proof-of-concept information system to address current data reusability challenges occurring in the context of so-called data integration centres that are being established as part of ongoing German Medical Informatics BMBF projects.