Large-Scale Robust Semisupervised Classification

IEEE Trans Cybern. 2019 Mar;49(3):907-917. doi: 10.1109/TCYB.2018.2789420. Epub 2018 Jan 17.

Abstract

Semisupervised learning aims to leverage both labeled and unlabeled data to improve performance, where most of them are graph-based methods. However, the graph-based semisupervised methods are not capable for large-scale data since the computational consumption on the construction of graph Laplacian matrix is huge. On the other hand, the substantial unlabeled data in training stage of semisupervised learning could cause large uncertainties and potential threats. Therefore, it is crucial to enhance the robustness of semisupervised classification. In this paper, a novel large-scale robust semisupervised learning method is proposed in the framework of capped l2,p -norm. This strategy is superior not only in computational cost because it makes the graph Laplacian matrix unnecessary, but also in robustness to outliers since the capped l2,p -norm used for loss measurement. An efficient optimization algorithm is exploited to solve the nonconvex and nonsmooth challenging problem. The complexity of the proposed algorithm is analyzed and discussed in theory detailedly. Finally, extensive experiments are conducted over six benchmark data sets to demonstrate the effectiveness and superiority of the proposed method.