Principal Component Adversarial Example

IEEE Trans Image Process. 2020 Feb 28. doi: 10.1109/TIP.2020.2975918. Online ahead of print.

Abstract

Despite having achieved excellent performance on various tasks, deep neural networks have been shown to be susceptible to adversarial examples, i.e., visual inputs crafted with structural imperceptible noise. To explain this phenomenon, previous works implicate the weak capability of the classification models and the difficulty of the classification tasks. These explanations appear to account for some of the empirical observations but lack deep insight into the intrinsic nature of adversarial examples, such as the generation method and transferability. Furthermore, previous works generate adversarial examples completely rely on a specific classifier (model). Consequently, the attack ability of adversarial examples is strongly dependent on the specific classifier. More importantly, adversarial examples cannot be generated without a trained classifier. In this paper, we raise a question: what is the real cause of the generation of adversarial examples? To answer this question, we propose a new concept, called the adversarial region, which explains the existence of adversarial examples as perturbations perpendicular to the tangent plane of the data manifold. This view yields a clear explanation of the transfer property across different models of adversarial examples. Moreover, with the notion of the adversarial region, we propose a novel target-free method to generate adversarial examples via principal component analysis. We verify our adversarial region hypothesis on a synthetic dataset and demonstrate through extensive experiments on real datasets that the adversarial examples generated by our method have competitive or even strong transferability compared with model-dependent adversarial example generating methods. Moreover, our experiment shows that the proposed method is more robust to defensive methods than previous methods.