Efficient methods for estimating constrained parameters with applications to lasso logistic regression

Guo-Liang Tian; Man-Lai Tang; Hong-Bin Fang; Ming Tan

doi:10.1016/j.csda.2007.11.007

Efficient methods for estimating constrained parameters with applications to lasso logistic regression

Comput Stat Data Anal. 2008 Mar 15;52(7):3528-3542. doi: 10.1016/j.csda.2007.11.007.

Authors

Guo-Liang Tian¹, Man-Lai Tang, Hong-Bin Fang, Ming Tan

Affiliation

¹ Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 10 South Pine Street, MSTF Suite 261, Baltimore, Maryland 21201, U.S.A.

Abstract

Fitting logistic regression models is challenging when their parameters are restricted. In this article, we first develop a quadratic lower-bound (QLB) algorithm for optimization with box or linear inequality constraints and derive the fastest QLB algorithm corresponding to the smallest global majorization matrix. The proposed QLB algorithm is particularly suited to problems to which EM-type algorithms are not applicable (e.g., logistic, multinomial logistic, and Cox's proportional hazards models) while it retains the same EM ascent property and thus assures the monotonic convergence. Secondly, we generalize the QLB algorithm to penalized problems in which the penalty functions may not be totally differentiable. The proposed method thus provides an alternative algorithm for estimation in lasso logistic regression, where the convergence of the existing lasso algorithm is not generally ensured. Finally, by relaxing the ascent requirement, convergence speed can be further accelerated. We introduce a pseudo-Newton method that retains the simplicity of the QLB algorithm and the fast convergence of the Newton method. Theoretical justification and numerical examples show that the pseudo-Newton method is up to 71 (in terms of CPU time) or 107 (in terms of number of iterations) times faster than the fastest QLB algorithm and thus makes bootstrap variance estimation feasible. Simulations and comparisons are performed and three real examples (Down syndrome data, kyphosis data, and colon microarray data) are analyzed to illustrate the proposed methods.

Abstract

Grants and funding