Sparse Multicategory Generalized Distance Weighted Discrimination in Ultra-High Dimensions

Entropy (Basel). 2020 Nov 5;22(11):1257. doi: 10.3390/e22111257.

Abstract

Distance weighted discrimination (DWD) is an appealing classification method that is capable of overcoming data piling problems in high-dimensional settings. Especially when various sparsity structures are assumed in these settings, variable selection in multicategory classification poses great challenges. In this paper, we propose a multicategory generalized DWD (MgDWD) method that maintains intrinsic variable group structures during selection using a sparse group lasso penalty. Theoretically, we derive minimizer uniqueness for the penalized MgDWD loss function and consistency properties for the proposed classifier. We further develop an efficient algorithm based on the proximal operator to solve the optimization problem. The performance of MgDWD is evaluated using finite sample simulations and miRNA data from an HIV study.

Keywords: DWD; L2-consistency; high dimension; multicategory classification; proximal algorithm; sparse group lasso.