Mining multi-center heterogeneous medical data with distributed synthetic learning

Qi Chang; Zhennan Yan; Mu Zhou; Hui Qu; Xiaoxiao He; Han Zhang; Lohendran Baskaran; Subhi Al'Aref; Hongsheng Li; Shaoting Zhang; Dimitris N Metaxas

doi:10.1038/s41467-023-40687-y

Mining multi-center heterogeneous medical data with distributed synthetic learning

Nat Commun. 2023 Sep 7;14(1):5510. doi: 10.1038/s41467-023-40687-y.

Authors

Qi Chang^#¹, Zhennan Yan^#², Mu Zhou^#^{2

3}, Hui Qu¹, Xiaoxiao He¹, Han Zhang¹, Lohendran Baskaran⁴, Subhi Al'Aref⁵, Hongsheng Li^{6

7}, Shaoting Zhang^{8

9

10}, Dimitris N Metaxas¹¹

Affiliations

¹ Department of Computer Science, Rutgers University, Piscataway, NJ, USA.
² SenseBrain Research, Princeton, NJ, USA.
³ Shanghai Artificial Intelligence Laboratory, Shanghai, China.
⁴ Department of Cardiovascular Medicine, National Heart Centre Singapore, and Duke-National University Of Singapore, Singapore, Singapore.
⁵ Department of Medicine, Division of Cardiology, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
⁶ Chinese University of Hong Kong, Hong Kong SAR, China. hsli@ee.cuhk.edu.hk.
⁷ Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China. hsli@ee.cuhk.edu.hk.
⁸ Shanghai Artificial Intelligence Laboratory, Shanghai, China. zhangshaoting@pjlab.org.cn.
⁹ Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China. zhangshaoting@pjlab.org.cn.
¹⁰ SenseTime, Shanghai, China. zhangshaoting@pjlab.org.cn.
¹¹ Department of Computer Science, Rutgers University, Piscataway, NJ, USA. dnm@cs.rutgers.edu.

^# Contributed equally.

Abstract

Overcoming barriers on the use of multi-center data for medical analytics is challenging due to privacy protection and data heterogeneity in the healthcare system. In this study, we propose the Distributed Synthetic Learning (DSL) architecture to learn across multiple medical centers and ensure the protection of sensitive personal information. DSL enables the building of a homogeneous dataset with entirely synthetic medical images via a form of GAN-based synthetic learning. The proposed DSL architecture has the following key functionalities: multi-modality learning, missing modality completion learning, and continual learning. We systematically evaluate the performance of DSL on different medical applications using cardiac computed tomography angiography (CTA), brain tumor MRI, and histopathology nuclei datasets. Extensive experiments demonstrate the superior performance of DSL as a high-quality synthetic medical image provider by the use of an ideal synthetic quality metric called Dist-FID. We show that DSL can be adapted to heterogeneous data and remarkably outperforms the real misaligned modalities segmentation model by 55% and the temporal datasets segmentation model by 8%.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't

MeSH terms

Angiography
Brain Neoplasms*
Cell Nucleus
Computed Tomography Angiography
Humans
Learning*

Abstract

Publication types

MeSH terms

Grants and funding