Cross Entropy in Deep Learning of Classifiers Is Unnecessary-ISBE Error Is All You Need

Władysław Skarbek

doi:10.3390/e26010065

Cross Entropy in Deep Learning of Classifiers Is Unnecessary-ISBE Error Is All You Need

Entropy (Basel). 2024 Jan 12;26(1):65. doi: 10.3390/e26010065.

Author

Władysław Skarbek¹

Affiliation

¹ Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-661 Warszawa, Poland.

PMID: 38248190
DOI: 10.3390/e26010065

Abstract

In deep learning of classifiers, the cost function usually takes the form of a combination of SoftMax and CrossEntropy functions. The SoftMax unit transforms the scores predicted by the model network into assessments of the degree (probabilities) of an object's membership to a given class. On the other hand, CrossEntropy measures the divergence of this prediction from the distribution of target scores. This work introduces the ISBE functionality, justifying the thesis about the redundancy of cross-entropy computation in deep learning of classifiers. Not only can we omit the calculation of entropy, but also, during back-propagation, there is no need to direct the error to the normalization unit for its backward transformation. Instead, the error is sent directly to the model's network. Using examples of perceptron and convolutional networks as classifiers of images from the MNIST collection, it is observed for ISBE that results are not degraded with SoftMax only but also with other activation functions such as Sigmoid, Tanh, or their hard variants HardSigmoid and HardTanh. Moreover, savings in the total number of operations were observed within the forward and backward stages. The article is addressed to all deep learning enthusiasts but primarily to programmers and students interested in the design of deep models. For example, it illustrates in code snippets possible ways to implement ISBE functionality but also formally proves that the SoftMax trick only applies to the class of dilated SoftMax functions with relocations.

Keywords: cross entropy; deep learning; gradient backpropagation; model inference; neural network; normalization function.