Gauge-Optimal Approximate Learning for Small Data Classification

Edoardo Vecchi; Davide Bassetti; Fabio Graziato; Lukáš Pospíšil; Illia Horenko

doi:10.1162/neco_a_01664

Gauge-Optimal Approximate Learning for Small Data Classification

Neural Comput. 2024 Apr 17:1-30. doi: 10.1162/neco_a_01664. Online ahead of print.

Authors

Edoardo Vecchi¹, Davide Bassetti², Fabio Graziato³, Lukáš Pospíšil⁴, Illia Horenko⁵

Affiliations

¹ Università della Svizzera Italiana, Faculty of Informatics, Institute of Computing, 6962 Lugano, Switzerland edoardo.vecchi@usi.ch.
² Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany bassetti@mathematik.uni-kl.de.
³ Independent researcher, 22070 Valmorea, Italy fabio.graziato94@gmail.com.
⁴ VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republi lukas.pospisil@vsb.cz.
⁵ Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany horenko@rptu.de.

PMID: 38669692
DOI: 10.1162/neco_a_01664

Abstract

Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents-under the assumption of a discrete segmentation of the feature space-a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.