Building generalized linear models with ultrahigh dimensional features: A sequentially conditional approach

Biometrics. 2020 Mar;76(1):47-60. doi: 10.1111/biom.13122. Epub 2019 Nov 6.

Abstract

Conditional screening approaches have emerged as a powerful alternative to the commonly used marginal screening, as they can identify marginally weak but conditionally important variables. However, most existing conditional screening methods need to fix the initial conditioning set, which may determine the ultimately selected variables. If the conditioning set is not properly chosen, the methods may produce false negatives and positives. Moreover, screening approaches typically need to involve tuning parameters and extra modeling steps in order to reach a final model. We propose a sequential conditioning approach by dynamically updating the conditioning set with an iterative selection process. We provide its theoretical properties under the framework of generalized linear models. Powered by an extended Bayesian information criterion as the stopping rule, the method will lead to a final model without the need to choose tuning parameters or threshold parameters. The practical utility of the proposed method is examined via extensive simulations and analysis of a real clinical study on predicting multiple myeloma patients' response to treatment based on their genomic profiles.

Keywords: extended Bayesian information criteria; high-dimensional predictors; predictive modeling; sequential conditioning; sure screening properties.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Biometry / methods*
  • Computer Simulation
  • Gene Expression Profiling / statistics & numerical data
  • Humans
  • Likelihood Functions
  • Linear Models*
  • Logistic Models
  • Models, Statistical
  • Multiple Myeloma / genetics
  • Multiple Myeloma / therapy