A simple and reliable instance selection for fast training support vector machine: Valid Border Recognition

Neural Netw. 2023 Sep:166:379-395. doi: 10.1016/j.neunet.2023.07.018. Epub 2023 Jul 17.

Abstract

Support vector machines (SVMs) are powerful statistical learning tools, but their application to large datasets can cause time-consuming training complexity. To address this issue, various instance selection (IS) approaches have been proposed, which choose a small fraction of critical instances and screen out others before training. However, existing methods have not been able to balance accuracy and efficiency well. Some methods miss critical instances, while others use complicated selection schemes that require even more execution time than training with all original instances, thus violating the initial intention of IS. In this work, we present a newly developed IS method called Valid Border Recognition (VBR). VBR selects the closest heterogeneous neighbors as valid border instances and incorporates this process into the creation of a reduced Gaussian kernel matrix, thus minimizing the execution time. To improve reliability, we propose a strengthened version of VBR (SVBR). Based on VBR, SVBR gradually adds farther heterogeneous neighbors as complements until the Lagrange multipliers of already selected instances become stable. In numerical experiments, the effectiveness of our proposed methods is verified on benchmark and synthetic datasets in terms of accuracy, execution time and inference time.

Keywords: Distance-based approach; Instance selection; Neighborhood approach; Support vector machine; Valid border instance.

MeSH terms

  • Algorithms*
  • Reproducibility of Results
  • Support Vector Machine*