Less Is More: Adaptive Trainable Gradient Dropout for Deep Neural Networks

Sensors (Basel). 2023 Jan 24;23(3):1325. doi: 10.3390/s23031325.

Abstract

The undeniable computational power of artificial neural networks has granted the scientific community the ability to exploit the available data in ways previously inconceivable. However, deep neural networks require an overwhelming quantity of data in order to interpret the underlying connections between them, and therefore, be able to complete the specific task that they have been assigned to. Feeding a deep neural network with vast amounts of data usually ensures efficiency, but may, however, harm the network's ability to generalize. To tackle this, numerous regularization techniques have been proposed, with dropout being one of the most dominant. This paper proposes a selective gradient dropout method, which, instead of relying on dropping random weights, learns to freeze the training process of specific connections, thereby increasing the overall network's sparsity in an adaptive manner, by driving it to utilize more salient weights. The experimental results show that the produced sparse network outperforms the baseline on numerous image classification datasets, and additionally, the yielded results occurred after significantly less training epochs.

Keywords: adaptive dropout; gradient dropout; gradient freezing; trainable dropout.

Grants and funding

This work was supported by the European Commission (INTREPID, Intelligent Toolkit for Reconnaissance and assessmEnt in Perilous Incidents) under Grant 883345.