Attention-based investigation and solution to the trade-off issue of adversarial training

Changbin Shao; Wenbin Li; Jing Huo; Zhenhua Feng; Yang Gao

doi:10.1016/j.neunet.2024.106224

Attention-based investigation and solution to the trade-off issue of adversarial training

Neural Netw. 2024 Jun:174:106224. doi: 10.1016/j.neunet.2024.106224. Epub 2024 Mar 2.

Authors

Changbin Shao¹, Wenbin Li², Jing Huo³, Zhenhua Feng⁴, Yang Gao³

Affiliations

¹ Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China; School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China.
² Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China. Electronic address: liwenbin@nju.edu.cn.
³ Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China.
⁴ School of Computer Science and Electronic Engineering, University of Surrey, Guildford GU2 7XH, UK.

PMID: 38479186
DOI: 10.1016/j.neunet.2024.106224

Abstract

Adversarial training has become the mainstream method to boost adversarial robustness of deep models. However, it often suffers from the trade-off dilemma, where the use of adversarial examples hurts the standard generalization of models on natural data. To study this phenomenon, we investigate it from the perspective of spatial attention. In brief, standard training typically encourages a model to conduct a comprehensive check to input space. But adversarial training often causes a model to overly concentrate on sparse spatial regions. This reduced tendency is beneficial to avoid adversarial accumulation but easily makes the model ignore abundant discriminative information, thereby resulting in weak generalization. To address this issue, this paper introduces an Attention-Enhanced Learning Framework (AELF) for robustness training. The main idea is to enable the model to inherit the attention pattern of standard pre-trained model through an embedding-level regularization. To be specific, given a teacher model built on natural examples, the embedding distribution of teacher model is used as a static constraint to regulate the embedding outputs of the objective model. This design is mainly supported with that the embedding feature of standard model is usually recognized as a rich semantic integration of input. For implementation, we present a simplified AELFs that can achieve the regularization with single cross entropy loss via the parameter initialization and parameter update strategy. This avoids the extra consistency comparison operation between embedding vectors. Experimental observations verify the rationality of our argument, and experimental results demonstrate that it can achieve remarkable improvements in generalization under the high-level robustness.

Keywords: Adversarial noise; Adversarial robustness; Adversarial training; Deep neural networks; Image classification; Standard generalization.

MeSH terms

Entropy
Generalization, Psychological*
Learning*
Semantics