Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity

Charles Lu; Adam Hanif; Praveer Singh; Ken Chang; Aaron S Coyner; James M Brown; Susan Ostmo; Robison V Paul Chan; Daniel Rubin; Michael F Chiang; John Peter Campbell; Jayashree Kalpathy-Cramer; Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows

doi:10.1016/j.oret.2022.02.015

Federated Learning for Multicenter Collaboration in Ophthalmology: Improving Classification Performance in Retinopathy of Prematurity

Ophthalmol Retina. 2022 Aug;6(8):657-663. doi: 10.1016/j.oret.2022.02.015. Epub 2022 Mar 14.

Authors

Collaborators

Imaging and Informatics in Retinopathy of Prematurity Consortium Members of the Imaging and Informatics in Retinopathy of Prematurity research consortium are as follows:
Michael F Chiang, Susan Ostmo, Sang Jin Kim, Kemal Sonmez, John Peter Campbell, Robert Schelonka, Aaron Coyner, R V Paul Chan, Karyn Jonas, Bhavana Kolli, Jason Horowitz, Osode Coki, Cheryl-Ann Eccles, Leora Sarna, Anton Orlin, Audina Berrocal, Catherin Negron, Kimberly Denser, Kristi Cumming, Tammy Osentoski, Tammy Check, Mary Zajechowski, Thomas Lee, Aaron Nagiel, Evan Kruger, Kathryn McGovern, Dilshad Contractor, Margaret Havunjian, Charles Simmons, Raghu Murthy, Sharon Galvis, Jerome Rotter, Ida Chen, Xiaohui Li, Kent Taylor, Kaye Roll, Mary Elizabeth Hartnett, Leah Owen, Darius Moshfeghi, Mariana Nunez, Zac Wennber-Smith, Jayashree Kalpathy-Cramer, Deniz Erdogmus, Stratis Ioannidis, Maria Ana Martinez-Castellanos, Samantha Salinas-Longoria, Rafael Romero, Andrea Arriola, Francisco Olguin-Manriquez, Miroslava Meraz-Gutierrez, Carlos M Dulanto-Reinoso, Cristina Montero-Mendoza

Affiliations

¹ Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts.
² Department of Ophthalmology, Oregon Health and Science University, Portland, Oregon.
³ School of Computer Science, University of Lincoln, Lincoln, United Kingdom.
⁴ Ophthalmology and Visual Sciences, University of Illinois at Chicago, Chicago, Illinois.
⁵ Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California.
⁶ National Eye Institute, National Institutes of Health, Bethesda, Maryland.
⁷ Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women's Hospital, Boston, Massachusetts. Electronic address: JKALPATHY-CRAMER@mgh.harvard.edu.

PMID: 35296449
DOI: 10.1016/j.oret.2022.02.015

Abstract

Objective: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution.

Design: Evaluation of a diagnostic test or technology.

Subjects: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis.

Methods: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC).

Main outcome measures: Model performance using AUROC and linearly weighted κ.

Results: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002).

Conclusions: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.

Keywords: Deep learning; Epidemiology; Federated learning; Retinopathy of prematurity.

Publication types

Multicenter Study
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Diagnostic Imaging
Humans
Infant, Newborn
Ophthalmology* / education
ROC Curve
Reproducibility of Results
Retinopathy of Prematurity* / diagnosis

Abstract

Publication types

MeSH terms

Grants and funding