Nonparametric Bayes modeling for case control studies with many predictors

Biometrics. 2016 Mar;72(1):184-92. doi: 10.1111/biom.12411. Epub 2015 Sep 22.

Abstract

It is common in biomedical research to run case-control studies involving high-dimensional predictors, with the main goal being detection of the sparse subset of predictors having a significant association with disease. Usual analyses rely on independent screening, considering each predictor one at a time, or in some cases on logistic regression assuming no interactions. We propose a fundamentally different approach based on a nonparametric Bayesian low rank tensor factorization model for the retrospective likelihood. Our model allows a very flexible structure in characterizing the distribution of multivariate variables as unknown and without any linear assumptions as in logistic regression. Predictors are excluded only if they have no impact on disease risk, either directly or through interactions with other predictors. Hence, we obtain an omnibus approach for screening for important predictors. Computation relies on an efficient Gibbs sampler. The methods are shown to have high power and low false discovery rates in simulation studies, and we consider an application to an epidemiology study of birth defects.

Keywords: Bayesian nonparametrics; Big data; Epidemiology; Retrospective likelihood; Sparse parallel factor analysis model; Tensor factorization.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bayes Theorem*
  • Case-Control Studies*
  • Computer Simulation
  • Congenital Abnormalities / epidemiology*
  • Data Interpretation, Statistical
  • Humans
  • Incidence
  • Infant, Newborn
  • Models, Statistical*
  • Reproducibility of Results
  • Risk Assessment / methods
  • Sample Size
  • Sensitivity and Specificity
  • Statistics, Nonparametric*