Privacy preserving Generative Adversarial Networks to model Electronic Health Records

Neural Netw. 2022 Sep:153:339-348. doi: 10.1016/j.neunet.2022.06.022. Epub 2022 Jun 25.

Abstract

Hospitals and General Practitioner (GP) surgeries within National Health Services (NHS), collect patient information on a routine basis to create personal health records such as family medical history, chronic diseases, medications and dosing. The collected information could be used to build and model various machine learning algorithms, to simplify the task of those working within the NHS. However, such Electronic Health Records are not made publicly available due to privacy concerns. In our paper, we propose a privacy-preserving Generative Adversarial Network (pGAN), which can generate synthetic data of high quality, while preserving the privacy and statistical properties of the source data. pGAN is evaluated on two distinct datasets, one posing as a Classification task, and the other as a Regression task. Privacy score of generated data is calculated using the Nearest Neighbour Adversarial Accuracy. Cosine similarity scores of synthetic data from our proposed model indicate that the data generated is similar in nature, but not identical. Additionally, our proposed model was able to preserve privacy while maintaining high utility. Machine learning models trained on both synthetic data and original data have achieved accuracies of 74.3% and 74.5% respectively on the classification dataset; while they have attained an R2-Score of 0.84 and 0.85 on synthetic and original data of the regression task respectively. Our results, therefore, indicate that synthetic data from the proposed model could replace the use of original data for machine learning while preserving privacy.

Keywords: AI; GAN; Machine learning; Privacy; Public health data.

MeSH terms

  • Algorithms
  • Data Collection
  • Electronic Health Records*
  • Humans
  • Machine Learning
  • Privacy*