Gradient Correction for White-Box Adversarial Attacks

IEEE Trans Neural Netw Learn Syst. 2023 Oct 11:PP. doi: 10.1109/TNNLS.2023.3315414. Online ahead of print.

Abstract

Deep neural networks (DNNs) play key roles in various artificial intelligence applications such as image classification and object recognition. However, a growing number of studies have shown that there exist adversarial examples in DNNs, which are almost imperceptibly different from the original samples but can greatly change the output of DNNs. Recently, many white-box attack algorithms have been proposed, and most of the algorithms concentrate on how to make the best use of gradients per iteration to improve adversarial performance. In this article, we focus on the properties of the widely used activation function, rectified linear unit (ReLU), and find that there exist two phenomena (i.e., wrong blocking and over transmission) misguiding the calculation of gradients for ReLU during backpropagation. Both issues enlarge the difference between the predicted changes of the loss function from gradients and corresponding actual changes and misguide the optimized direction, which results in larger perturbations. Therefore, we propose a universal gradient correction adversarial example generation method, called ADV-ReLU, to enhance the performance of gradient-based white-box attack algorithms such as fast gradient signed method (FGSM), iterative FGSM (I-FGSM), momentum I-FGSM (MI-FGSM), and variance tuning MI-FGSM (VMI-FGSM). Through backpropagation, our approach calculates the gradient of the loss function with respect to the network input, maps the values to scores, and selects a part of them to update the misguided gradients. Comprehensive experimental results on ImageNet and CIFAR10 demonstrate that our ADV-ReLU can be easily integrated into many state-of-the-art gradient-based white-box attack algorithms, as well as transferred to black-box attacks, to further decrease perturbations measured in the l2 -norm.