Approaches to Regularized Regression - A Comparison between Gradient Boosting and the Lasso

Methods Inf Med. 2016 Oct 17;55(5):422-430. doi: 10.3414/ME16-01-0033. Epub 2016 Sep 14.

Abstract

Background: Penalization and regularization techniques for statistical modeling have attracted increasing attention in biomedical research due to their advantages in the presence of high-dimensional data. A special focus lies on algorithms that incorporate automatic variable selection like the least absolute shrinkage operator (lasso) or statistical boosting techniques.

Objectives: Focusing on the linear regression framework, this article compares the two most-common techniques for this task, the lasso and gradient boosting, both from a methodological and a practical perspective.

Methods: We describe these methods highlighting under which circumstances their results will coincide in low-dimensional settings. In addition, we carry out extensive simulation studies comparing the performance in settings with more predictors than observations and investigate multiple combinations of noise-to-signal ratio and number of true non-zero coeffcients. Finally, we examine the impact of different tuning methods on the results.

Results: Both methods carry out penalization and variable selection for possibly highdimensional data, often resulting in very similar models. An advantage of the lasso is its faster run-time, a strength of the boosting concept is its modular nature, making it easy to extend to other regression settings.

Conclusions: Although following different strategies with respect to optimization and regularization, both methods imply similar constraints to the estimation problem leading to a comparable performance regarding prediction accuracy and variable selection in practice.

Keywords: Penalization; boosting; high-dimensional data; lasso; regularization; variable selection.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Models, Theoretical
  • Numerical Analysis, Computer-Assisted
  • Regression Analysis*
  • Reproducibility of Results
  • Signal Processing, Computer-Assisted
  • Software