Hyper-flexible Convolutional Neural Networks based on Generalized Lehmer and Power Means

Vagan Terziyan; Diana Malyk; Mariia Golovianko; Vladyslav Branytskyi

doi:10.1016/j.neunet.2022.08.017

Hyper-flexible Convolutional Neural Networks based on Generalized Lehmer and Power Means

Neural Netw. 2022 Nov:155:177-203. doi: 10.1016/j.neunet.2022.08.017. Epub 2022 Aug 23.

Authors

Vagan Terziyan¹, Diana Malyk², Mariia Golovianko³, Vladyslav Branytskyi⁴

Affiliations

¹ Faculty of Information Technology, University of Jyväskylä, Finland. Electronic address: vagan.terziyan@jyu.fi.
² Department of Artificial Intelligence, Kharkiv National University of Radio Electronics, Ukraine. Electronic address: diana.malyk@nure.ua.
³ Department of Artificial Intelligence, Kharkiv National University of Radio Electronics, Ukraine. Electronic address: mariia.golovianko@nure.ua.
⁴ Department of Artificial Intelligence, Kharkiv National University of Radio Electronics, Ukraine. Electronic address: vladyslav.branytskyi@nure.ua.

PMID: 36058022
DOI: 10.1016/j.neunet.2022.08.017

Abstract

Convolutional Neural Network is one of the famous members of the deep learning family of neural network architectures, which is used for many purposes, including image classification. In spite of the wide adoption, such networks are known to be highly tuned to the training data (samples representing a particular problem), and they are poorly reusable to address new problems. One way to change this would be, in addition to trainable weights, to apply trainable parameters of the mathematical functions, which simulate various neural computations within such networks. In this way, we may distinguish between the narrowly focused task-specific parameters (weights) and more generic capability-specific parameters. In this paper, we suggest a couple of flexible mathematical functions (Generalized Lehmer Mean and Generalized Power Mean) with trainable parameters to replace some fixed operations (such as ordinary arithmetic mean or simple weighted aggregation), which are traditionally used within various components of a convolutional neural network architecture. We named the overall architecture with such an update as a hyper-flexible convolutional neural network. We provide mathematical justification of various components of such architecture and experimentally show that it performs better than the traditional one, including better robustness regarding the adversarial perturbations of testing data.

Keywords: Adversarial robustness; Convolutional Neural Network; Flexibility; Lehmer Mean; Pooling; Power Mean.

MeSH terms

Machine Learning*
Neural Networks, Computer*