Investigating the impact of data heterogeneity on the performance of federated learning algorithm using medical imaging

PLoS One. 2024 May 15;19(5):e0302539. doi: 10.1371/journal.pone.0302539. eCollection 2024.

Abstract

In recent years, Federated Learning (FL) has gained traction as a privacy-centric approach in medical imaging. This study explores the challenges posed by data heterogeneity on FL algorithms, using the COVIDx CXR-3 dataset as a case study. We contrast the performance of the Federated Averaging (FedAvg) algorithm on non-identically and independently distributed (non-IID) data against identically and independently distributed (IID) data. Our findings reveal a notable performance decline with increased data heterogeneity, emphasizing the need for innovative strategies to enhance FL in diverse environments. This research contributes to the practical implementation of FL, extending beyond theoretical concepts and addressing the nuances in medical imaging applications. This research uncovers the inherent challenges in FL due to data diversity. It sets the stage for future advancements in FL strategies to effectively manage data heterogeneity, especially in sensitive fields like healthcare.

MeSH terms

  • Algorithms*
  • COVID-19 / diagnostic imaging
  • COVID-19 / epidemiology
  • Diagnostic Imaging* / methods
  • Humans
  • Machine Learning
  • SARS-CoV-2 / isolation & purification

Grants and funding

This work was supported by the research grant [SEED-CCIS-2023-166]; Prince Sultan University, Riyadh, Saudi Arabia. The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Chanrges. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.