Differential Private Deep Learning Models for Analyzing Breast Cancer Omics Data

Md Mohaiminul Islam; Noman Mohammed; Yang Wang; Pingzhao Hu

doi:10.3389/fonc.2022.879607

Differential Private Deep Learning Models for Analyzing Breast Cancer Omics Data

Front Oncol. 2022 Jun 23:12:879607. doi: 10.3389/fonc.2022.879607. eCollection 2022.

Authors

Md Mohaiminul Islam¹, Noman Mohammed², Yang Wang², Pingzhao Hu^{1

2

3

4}

Affiliations

¹ Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada.
² Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada.
³ Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, Canada.
⁴ Research Institute for Oncology and Hematology, CancerCare Manitoba, Winnipeg, MB, Canada.

Abstract

Proper analysis of high-dimensional human genomic data is necessary to increase human knowledge about fundamental biological questions such as disease associations and drug sensitivity. However, such data contain sensitive private information about individuals and can be used to identify an individual (i.e., privacy violation) uniquely. Therefore, raw genomic datasets cannot be publicly published or shared with researchers. The recent success of deep learning (DL) in diverse problems proved its suitability for analyzing the high volume of high-dimensional genomic data. Still, DL-based models leak information about the training samples. To overcome this challenge, we can incorporate differential privacy mechanisms into the DL analysis framework as differential privacy can protect individuals' privacy. We proposed a differential privacy based DL framework to solve two biological problems: breast cancer status (BCS) and cancer type (CT) classification, and drug sensitivity prediction. To predict BCS and CT using genomic data, we built a differential private (DP) deep autoencoder (dpAE) using private gene expression datasets that performs low-dimensional data representation learning. We used dpAE features to build multiple DP binary classifiers to predict BCS and CT in any individual. To predict drug sensitivity, we used the Genomics of Drug Sensitivity in Cancer (GDSC) dataset. We extracted GDSC's dpAE features to build our DP drug sensitivity prediction model for 265 drugs. Evaluation of our proposed DP framework shows that it achieves improved prediction performance in predicting BCS, CT, and drug sensitivity than the previously published DP work.

Keywords: Rényi differential privacy; breast cancer; deep learning; differential privacy; omics data.