Studying the association of diabetes and healthcare cost on distributed data from the Maastricht Study and Statistics Netherlands using a privacy-preserving federated learning infrastructure

J Biomed Inform. 2022 Oct:134:104194. doi: 10.1016/j.jbi.2022.104194. Epub 2022 Sep 5.

Abstract

The mining of personal data collected by multiple organizations remains challenging in the presence of technical barriers, privacy concerns, and legal and/or organizational restrictions. While a number of privacy-preserving and data mining frameworks have recently emerged, much remains to show their practical utility. In this study, we implement and utilize a secure infrastructure using data from Statistics Netherlands and the Maastricht Study to learn the association between Type 2 Diabetes Mellitus (T2DM) and healthcare expenses considering the impact of lifestyle, physical activities, and complications of T2DM. Through experiments using real-world distributed personal data, we present the feasibility and effectiveness of the secure infrastructure for practical use cases of linking and analyzing vertically partitioned data across multiple organizations. We discovered that individuals diagnosed with T2DM had significantly higher expenses than those with prediabetes, while participants with prediabetes spent more than those without T2DM in all the included healthcare categories to different degrees. We further discuss a joint effort from technical, ethical-legal, and domain-specific experts that is highly valued for applying such a secure infrastructure to real-life use cases to protect data privacy.

Keywords: Distributed data; Federated learning; Healthcare cost; Privacy-preserving data mining; Type 2 diabetes; Vertically partitioned data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Diabetes Mellitus, Type 2* / therapy
  • Health Care Costs
  • Humans
  • Netherlands
  • Prediabetic State*
  • Privacy