Breaking medical data sharing boundaries by using synthesized radiographs

Tianyu Han; Sven Nebelung; Christoph Haarburger; Nicolas Horst; Sebastian Reinartz; Dorit Merhof; Fabian Kiessling; Volkmar Schulz; Daniel Truhn

doi:10.1126/sciadv.abb7973

Breaking medical data sharing boundaries by using synthesized radiographs

Sci Adv. 2020 Dec 2;6(49):eabb7973. doi: 10.1126/sciadv.abb7973. Print 2020 Dec.

Authors

Tianyu Han¹, Sven Nebelung², Christoph Haarburger³, Nicolas Horst⁴, Sebastian Reinartz^{1

5}, Dorit Merhof^{4

6

7}, Fabian Kiessling^{6

7

8}, Volkmar Schulz^{9

6

7}, Daniel Truhn^{3

5}

Affiliations

¹ Physics of Molecular Imaging Systems, Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany.
² Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany.
³ Aristra GmbH, Berlin, Germany.
⁴ Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany.
⁵ Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
⁶ Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany.
⁷ Comprehensive Diagnostic Center Aachen (CDCA), University Hospital RWTH Aachen, Aachen, Germany.
⁸ Institute for Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany.
⁹ Physics of Molecular Imaging Systems, Experimental Molecular Imaging, RWTH Aachen University, Aachen, Germany. schulz@pmi.rwth-aachen.de.

Abstract

Computer vision (CV) has the potential to change medicine fundamentally. Expert knowledge provided by CV can enhance diagnosis. Unfortunately, existing algorithms often remain below expectations, as databases used for training are usually too small, incomplete, and heterogeneous in quality. Moreover, data protection is a serious obstacle to the exchange of data. To overcome this limitation, we propose to use generative models (GMs) to produce high-resolution synthetic radiographs that do not contain any personal identification information. Blinded analyses by CV and radiology experts confirmed the high similarity of synthesized and real radiographs. The combination of pooled GM improves the performance of CV algorithms trained on smaller datasets, and the integration of synthesized data into patient data repositories can compensate for underrepresented disease entities. By integrating federated learning strategies, even hospitals with few datasets can contribute to and benefit from GM training.

Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).

Publication types

Research Support, Non-U.S. Gov't