Identifying disease-causing mutations with privacy protection

Mete Akgün; Ali Burak Ünal; Bekir Ergüner; Nico Pfeifer; Oliver Kohlbacher

doi:10.1093/bioinformatics/btaa641

Identifying disease-causing mutations with privacy protection

Bioinformatics. 2021 Jan 29;36(21):5205-5213. doi: 10.1093/bioinformatics/btaa641.

Authors

Mete Akgün^{1

2}, Ali Burak Ünal², Bekir Ergüner³, Nico Pfeifer^{2

4

5}, Oliver Kohlbacher^{1

4

6

7}

Affiliations

¹ Translational Bioinformatics, University Hospital Tübingen, Tübingen 72026, Germany.
² Methods in Medical Informatics, Dept. of Computer Science, University of Tübingen, Tübingen 72026, Germany.
³ CeMM Research Center for Molecular Medicine, Austrian Academy of Sciences, Vienna, Austria.
⁴ Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72026, Germany.
⁵ Statistical Learning in Computational Biology, Max Planck Institute for Informatics, Saarbrücken 66123, Germany.
⁶ Applied Bioinformatics, Dept. of Computer Science, University of Tübingen, Tübingen 72026, Germany.
⁷ Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen 72026, Germany.

Abstract

Motivation: The use of genome data for diagnosis and treatment is becoming increasingly common. Researchers need access to as many genomes as possible to interpret the patient genome, to obtain some statistical patterns and to reveal disease-gene relationships. The sensitive information contained in the genome data and the high risk of re-identification increase the privacy and security concerns associated with sharing such data. In this article, we present an approach to identify disease-associated variants and genes while ensuring patient privacy. The proposed method uses secure multi-party computation to find disease-causing mutations under specific inheritance models without sacrificing the privacy of individuals. It discloses only variants or genes obtained as a result of the analysis. Thus, the vast majority of patient data can be kept private.

Results: Our prototype implementation performs analyses on thousands of genomic data in milliseconds, and the runtime scales logarithmically with the number of patients. We present the first inheritance model (recessive, dominant and compound heterozygous) based privacy-preserving analyses of genomic data to find disease-causing mutations. Furthermore, we re-implement the privacy-preserving methods (MAX, SETDIFF and INTERSECTION) proposed in a previous study. Our MAX, SETDIFF and INTERSECTION implementations are 2.5, 1122 and 341 times faster than the corresponding operations of the state-of-the-art protocol, respectively.

Availability and implementation: https://gitlab.com/DIFUTURE/privacy-preserving-genomic-diagnosis.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Confidentiality
Genome-Wide Association Study
Genomics*
Humans
Mutation
Privacy*

Abstract

Publication types

MeSH terms

Grants and funding