Privacy preserving validation for multiomic prediction models

Talal Ahmed; Mark A Carty; Stephane Wenric; Jonathan R Dry; Ameen A Salahudeen; Aly A Khan; Eric Lefkofsky; Martin C Stumpe; Raphael Pelossof

doi:10.1093/bib/bbac110

Privacy preserving validation for multiomic prediction models

Brief Bioinform. 2022 May 13;23(3):bbac110. doi: 10.1093/bib/bbac110.

Authors

Talal Ahmed¹, Mark A Carty¹, Stephane Wenric¹, Jonathan R Dry¹, Ameen A Salahudeen¹, Aly A Khan¹, Eric Lefkofsky¹, Martin C Stumpe¹, Raphael Pelossof¹

Affiliation

¹ Tempus Labs Inc., Chicago, IL 60654, USA.

Abstract

Reproducibility of results obtained using ribonucleic acid (RNA) data across labs remains a major hurdle in cancer research. Often, molecular predictors trained on one dataset cannot be applied to another due to differences in RNA library preparation and quantification, which inhibits the validation of predictors across labs. While current RNA correction algorithms reduce these differences, they require simultaneous access to patient-level data from all datasets, which necessitates the sharing of training data for predictors when sharing predictors. Here, we describe SpinAdapt, an unsupervised RNA correction algorithm that enables the transfer of molecular models without requiring access to patient-level data. It computes data corrections only via aggregate statistics of each dataset, thereby maintaining patient data privacy. Despite an inherent trade-off between privacy and performance, SpinAdapt outperforms current correction methods, like Seurat and ComBat, on publicly available cancer studies, including TCGA and ICGC. Furthermore, SpinAdapt can correct new samples, thereby enabling unbiased evaluation on validation cohorts. We expect this novel correction paradigm to enhance research reproducibility and to preserve patient privacy.

Keywords: machine learning; model validation; privacy; reproducibility; transcriptomics; translational research.

MeSH terms

Algorithms
Confidentiality*
Humans
Privacy*
RNA
Reproducibility of Results

Substances

RNA