Feature-specific inference for penalized regression using local false discovery rates

Stat Med. 2023 Apr 30;42(9):1412-1429. doi: 10.1002/sim.9678. Epub 2023 Feb 3.

Abstract

Penalized regression methods such as the lasso are a popular approach to analyzing high-dimensional data. One attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for any level of regularization. This is particularly useful for models with a few highly significant features but a high overall false discovery rate, a relatively common occurrence when using cross validation to select a model. It is also flexible enough to be applied to many varieties of penalized likelihoods including generalized linear models and Cox regression, and a variety of penalties, including the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to a case study involving gene expression in breast cancer patients.

Keywords: false discovery rates; high-dimensional data; high-dimensional models; lasso; penalized regression.

MeSH terms

  • Breast Neoplasms* / genetics
  • Female
  • Humans
  • Linear Models
  • Probability
  • Regression Analysis
  • Reproducibility of Results