Breaking medical data sharing boundaries by using synthesized radiographs

Sci Adv. 2020 Dec 2;6(49):eabb7973. doi: 10.1126/sciadv.abb7973. Print 2020 Dec.

Abstract

Computer vision (CV) has the potential to change medicine fundamentally. Expert knowledge provided by CV can enhance diagnosis. Unfortunately, existing algorithms often remain below expectations, as databases used for training are usually too small, incomplete, and heterogeneous in quality. Moreover, data protection is a serious obstacle to the exchange of data. To overcome this limitation, we propose to use generative models (GMs) to produce high-resolution synthetic radiographs that do not contain any personal identification information. Blinded analyses by CV and radiology experts confirmed the high similarity of synthesized and real radiographs. The combination of pooled GM improves the performance of CV algorithms trained on smaller datasets, and the integration of synthesized data into patient data repositories can compensate for underrepresented disease entities. By integrating federated learning strategies, even hospitals with few datasets can contribute to and benefit from GM training.

Publication types

  • Research Support, Non-U.S. Gov't