Integrating Biological Knowledge Into Case-Control Analysis Through Iterated Conditional Modes/Medians Algorithm

J Comput Biol. 2020 Jul;27(7):1171-1179. doi: 10.1089/cmb.2019.0319. Epub 2019 Nov 7.

Abstract

Logistic regression is an effective tool in case-control analysis. With the advanced high throughput technology, a quest to seek a fast and efficient method in fitting high-dimensional logistic regression has gained much interest. An empirical Bayes model for logistic regression is considered in this article. A spike-and-slab prior is used for variable selection purpose, which plays a vital role in building an effective predictive model while making model interpretable. To increase the power of variable selection, we incorporate biological knowledge through the Ising prior. The development of the iterated conditional modes/medians (ICM/M) algorithm is proposed to fit the logistic model that has computational advantage over Markov Chain Monte Carlo (MCMC) algorithms. The implementation of the ICM/M algorithm for both linear and logistic models can be found in R package icmm that is freely available on Comprehensive R Archive Network (CRAN). Simulation studies were carried out to assess the performances of our method, with lasso and adaptive lasso as benchmark. Overall, the simulation studies show that the ICM/M outperform the others in terms of number of false positives and have competitive predictive ability. An application to a real data set from Parkinson's disease study was also carried out for illustration. To identify important variables, our approach provides flexibility to select variables based on local posterior probabilities while controlling false discovery rate at a desired level rather than relying only on regression coefficients.

Keywords: empirical Bayes variable selection; genome-wide association studies; iterated conditional modes/medians; logistic regression; single nucleotide polymorphism.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Bayes Theorem
  • Case-Control Studies*
  • Gene Frequency
  • Gene Regulatory Networks
  • Genome-Wide Association Study / statistics & numerical data
  • Genomics / statistics & numerical data*
  • Humans
  • Logistic Models
  • Markov Chains
  • Parkinson Disease / genetics*
  • Polymorphism, Single Nucleotide