Early stopping by correlating online indicators in neural networks

Manuel Vilares Ferro; Yerai Doval Mosquera; Francisco J Ribadas Pena; Víctor M Darriba Bilbao

doi:10.1016/j.neunet.2022.11.035

Early stopping by correlating online indicators in neural networks

Neural Netw. 2023 Feb:159:109-124. doi: 10.1016/j.neunet.2022.11.035. Epub 2022 Dec 14.

Authors

Manuel Vilares Ferro¹, Yerai Doval Mosquera², Francisco J Ribadas Pena³, Víctor M Darriba Bilbao⁴

Affiliations

¹ Department of Computer Science, University of Vigo, Campus As Lagoas s/n, 32004 - Ourense, Spain. Electronic address: vilares@uvigo.es.
² Department of Computer Science, University of Vigo, Campus As Lagoas s/n, 32004 - Ourense, Spain. Electronic address: yerai.doval@uvigo.es.
³ Department of Computer Science, University of Vigo, Campus As Lagoas s/n, 32004 - Ourense, Spain. Electronic address: ribadas@uvigo.es.
⁴ Department of Computer Science, University of Vigo, Campus As Lagoas s/n, 32004 - Ourense, Spain. Electronic address: darriba@uvigo.es.

PMID: 36563483
DOI: 10.1016/j.neunet.2022.11.035

Abstract

In order to minimize the generalization error in neural networks, a novel technique to identify overfitting phenomena when training the learner is formally introduced. This enables support of a reliable and trustworthy early stopping condition, thus improving the predictive power of that type of modeling. Our proposal exploits the correlation over time in a collection of online indicators, namely characteristic functions for indicating if a set of hypotheses are met, associated with a range of independent stopping conditions built from a canary judgment to evaluate the presence of overfitting. That way, we provide a formal basis for decision making in terms of interrupting the learning process. As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments, thus seeking both a wider operating range and greater diagnostic reliability. With a view to illustrating the effectiveness of the halting condition described, we choose to work in the sphere of natural language processing, an operational continuum increasingly based on machine learning. As a case study, we focus on parser generation, one of the most demanding and complex tasks in the domain. The selection of cross-validation as a canary function enables an actual comparison with the most representative early stopping conditions based on overfitting identification, pointing to a promising start toward an optimal bias and variance control.

Keywords: Canary functions; Early stopping; Natural language processing; Neural networks; Overfitting.

MeSH terms

Machine Learning*
Natural Language Processing
Neural Networks, Computer*
Reproducibility of Results
Software