Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging

Soroosh Tayebi Arasteh; Alexander Ziller; Christiane Kuhl; Marcus Makowski; Sven Nebelung; Rickmer Braren; Daniel Rueckert; Daniel Truhn; Georgios Kaissis

doi:10.1038/s43856-024-00462-6

Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging

Commun Med (Lond). 2024 Mar 14;4(1):46. doi: 10.1038/s43856-024-00462-6.

Authors

Soroosh Tayebi Arasteh^#¹, Alexander Ziller^#^{2

3}, Christiane Kuhl⁴, Marcus Makowski⁵, Sven Nebelung⁴, Rickmer Braren⁵, Daniel Rueckert⁶, Daniel Truhn⁷, Georgios Kaissis^{8

9

10

11}

Affiliations

¹ Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany. soroosh.arasteh@rwth-aachen.de.
² Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany. alex.ziller@tum.de.
³ Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany. alex.ziller@tum.de.
⁴ Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
⁵ Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany.
⁶ Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany.
⁷ Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany. dtruhn@ukaachen.de.
⁸ Institute of Diagnostic and Interventional Radiology, Technical University of Munich, Munich, Germany. g.kaissis@tum.de.
⁹ Artificial Intelligence in Healthcare and Medicine, Technical University of Munich, Munich, Germany. g.kaissis@tum.de.
¹⁰ Department of Computing, Imperial College London, London, United Kingdom. g.kaissis@tum.de.
¹¹ Institute for Machine Learning in Biomedical Imaging, Helmholtz Munich, Neuherberg, Germany. g.kaissis@tum.de.

^# Contributed equally.

Abstract

Background: Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training.

Methods: We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference.

Results: We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training.

Conclusions: Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

Plain language summary

Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients’ images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

Abstract

Plain language summary

Grants and funding