Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification

Entropy (Basel). 2020 May 13;22(5):543. doi: 10.3390/e22050543.

Abstract

In this paper, we consider prediction and variable selection in the misspecified binary classification models under the high-dimensional scenario. We focus on two approaches to classification, which are computationally efficient, but lead to model misspecification. The first one is to apply penalized logistic regression to the classification data, which possibly do not follow the logistic model. The second method is even more radical: we just treat class labels of objects as they were numbers and apply penalized linear regression. In this paper, we investigate thoroughly these two approaches and provide conditions, which guarantee that they are successful in prediction and variable selection. Our results hold even if the number of predictors is much larger than the sample size. The paper is completed by the experimental results.

Keywords: misclassification risk; model misspecification; penalized estimation; supervised classification; variable selection consistency.