A systematic review of federated learning applications for biomedical data

Matthew G Crowson; Dana Moukheiber; Aldo Robles Arévalo; Barbara D Lam; Sreekar Mantena; Aakanksha Rana; Deborah Goss; David W Bates; Leo Anthony Celi

doi:10.1371/journal.pdig.0000033

A systematic review of federated learning applications for biomedical data

PLOS Digit Health. 2022 May 19;1(5):e0000033. doi: 10.1371/journal.pdig.0000033. eCollection 2022 May.

Authors

Matthew G Crowson^{1

2}, Dana Moukheiber³, Aldo Robles Arévalo^{4

5}, Barbara D Lam⁶, Sreekar Mantena⁷, Aakanksha Rana⁸, Deborah Goss¹, David W Bates^{9

10}, Leo Anthony Celi^{11

12}

Affiliations

¹ Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, Massachusetts, United States of America.
² Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Massachusetts, United States of America.
³ Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, United States of America.
⁴ IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.
⁵ Data & Analytics, NTT DATA Portugal, Lisbon, Portugal.
⁶ Department of Hematology & Oncology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
⁷ Harvard College, Boston, Massachusetts, United States of America.
⁸ Massachusetts Institute of Technology, Boston, Massachusetts, United States of America.
⁹ Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, United States of America.
¹⁰ Department of Health Policy and Management, Harvard T. H. Chan School of Public Health, Boston, MA, United States of America.
¹¹ Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America.
¹² Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.

Abstract

Objectives: Federated learning (FL) allows multiple institutions to collaboratively develop a machine learning algorithm without sharing their data. Organizations instead share model parameters only, allowing them to benefit from a model built with a larger dataset while maintaining the privacy of their own data. We conducted a systematic review to evaluate the current state of FL in healthcare and discuss the limitations and promise of this technology.

Methods: We conducted a literature search using PRISMA guidelines. At least two reviewers assessed each study for eligibility and extracted a predetermined set of data. The quality of each study was determined using the TRIPOD guideline and PROBAST tool.

Results: 13 studies were included in the full systematic review. Most were in the field of oncology (6 of 13; 46.1%), followed by radiology (5 of 13; 38.5%). The majority evaluated imaging results, performed a binary classification prediction task via offline learning (n = 12; 92.3%), and used a centralized topology, aggregation server workflow (n = 10; 76.9%). Most studies were compliant with the major reporting requirements of the TRIPOD guidelines. In all, 6 of 13 (46.2%) of studies were judged at high risk of bias using the PROBAST tool and only 5 studies used publicly available data.

Conclusion: Federated learning is a growing field in machine learning with many promising uses in healthcare. Few studies have been published to date. Our evaluation found that investigators can do more to address the risk of bias and increase transparency by adding steps for data homogeneity or sharing required metadata and code.

Copyright: © 2022 Crowson et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

T15 LM007092/LM/NLM NIH HHS/United States