Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality

Genetics. 2024 Mar 6;226(3):iyad210. doi: 10.1093/genetics/iyad210.

Abstract

To adhere to and capitalize on the benefits of the FAIR (findable, accessible, interoperable, and reusable) principles in agricultural genome-to-phenome studies, it is crucial to address privacy and intellectual property issues that prevent sharing and reuse of data in research and industry. Direct sharing of genotype and phenotype data is often prohibited due to intellectual property and privacy concerns. Thus, there is a pressing need for encryption methods that obscure confidential aspects of the data, without affecting the outcomes of certain statistical analyses. A homomorphic encryption method for genotypes and phenotypes (HEGP) has been proposed for single-marker regression in genome-wide association studies (GWAS) using linear mixed models with Gaussian errors. This methodology permits frequentist likelihood-based parameter estimation and inference. In this paper, we extend HEGP to broader applications in genome-to-phenome analyses. We show that HEGP is suited to commonly used linear mixed models for genetic analyses of quantitative traits including genomic best linear unbiased prediction (GBLUP) and ridge-regression best linear unbiased prediction (RR-BLUP), as well as Bayesian variable selection methods (e.g. those in Bayesian Alphabet), for genetic parameter estimation, genomic prediction, and GWAS. By advancing the capabilities of HEGP, we offer researchers and industry professionals a secure and efficient approach for collaborative genomic analyses while preserving data confidentiality.

Keywords: GWAS; genomic prediction; homomorphic encryption; joint analysis; mixed model.

MeSH terms

  • Bayes Theorem
  • Confidentiality
  • Genome-Wide Association Study* / methods
  • Genomics* / methods
  • Genotype
  • Likelihood Functions
  • Phenotype