Variational Bayes for high-dimensional proportional hazards models with applications within gene expression

Michael Komodromos; Eric O Aboagye; Marina Evangelou; Sarah Filippi; Kolyan Ray

doi:10.1093/bioinformatics/btac416

Variational Bayes for high-dimensional proportional hazards models with applications within gene expression

Bioinformatics. 2022 Aug 10;38(16):3918-3926. doi: 10.1093/bioinformatics/btac416.

Authors

Michael Komodromos¹, Eric O Aboagye², Marina Evangelou¹, Sarah Filippi¹, Kolyan Ray¹

Affiliations

¹ Department of Mathematics, Imperial College London, London SW7 2AZ, UK.
² Department of Surgery and Cancer, Imperial College London, London W12 0NN, UK.

Abstract

Motivation: Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.

Results: We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as sparse variational Bayes. Our method, based on a mean-field variational approximation, overcomes the high computational cost of Markov chain Monte Carlo, whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.

Availability and implementation: our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem*
Gene Expression
Humans
Markov Chains
Monte Carlo Method
Proportional Hazards Models

Abstract

Publication types

MeSH terms

Grants and funding