A Framework of Learning Through Empirical Gain Maximization

Yunlong Feng; Qiang Wu

doi:10.1162/neco_a_01384

A Framework of Learning Through Empirical Gain Maximization

Neural Comput. 2021 May 13;33(6):1656-1697. doi: 10.1162/neco_a_01384.

Authors

Yunlong Feng¹, Qiang Wu²

Affiliations

¹ Department of Mathematics and Statistics, State University of New York at Albany, Albany, NY 12222, U.S.A. ylfeng@albany.edu.
² Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, U.S.A. qwu@mtsu.edu.

PMID: 34496383
DOI: 10.1162/neco_a_01384

Abstract

We develop in this letter a framework of empirical gain maximization (EGM) to address the robust regression problem where heavy-tailed noise or outliers may be present in the response variable. The idea of EGM is to approximate the density function of the noise distribution instead of approximating the truth function directly as usual. Unlike the classical maximum likelihood estimation that encourages equal importance of all observations and could be problematic in the presence of abnormal observations, EGM schemes can be interpreted from a minimum distance estimation viewpoint and allow the ignorance of those observations. Furthermore, we show that several well-known robust nonconvex regression paradigms, such as Tukey regression and truncated least square regression, can be reformulated into this new framework. We then develop a learning theory for EGM by means of which a unified analysis can be conducted for these well-established but not fully understood regression approaches. This new framework leads to a novel interpretation of existing bounded nonconvex loss functions. Within this new framework, the two seemingly irrelevant terminologies, the well-known Tukey's biweight loss for robust regression and the triweight kernel for nonparametric smoothing, are closely related. More precisely, we show that Tukey's biweight loss can be derived from the triweight kernel. Other frequently employed bounded nonconvex loss functions in machine learning, such as the truncated square loss, the Geman-McClure loss, and the exponential squared loss, can also be reformulated from certain smoothing kernels in statistics. In addition, the new framework enables us to devise new bounded nonconvex loss functions for robust learning.