LAFIT: Efficient and Reliable Evaluation of Adversarial Defenses With Latent Features

IEEE Trans Pattern Anal Mach Intell. 2024 Jan;46(1):354-369. doi: 10.1109/TPAMI.2023.3323698. Epub 2023 Dec 5.

Abstract

Deep convolutional neural networks (CNNs) can be easily tricked to give incorrect outputs by adding tiny perturbations to the input that are imperceptible to humans. This makes them susceptible to adversarial attacks, and poses significant security risks to deep learning systems, and presents a great challenge in making CNNs robust against such attacks. An influx of defense strategies have thus been proposed to improve the robustness of CNNs. Current attack methods, however, may fail to accurately or efficiently evaluate the robustness of defending models. In this paper, we thus propose a unified lp white-box attack strategy, LAFIT, to harness the defender's latent features in its gradient descent steps, and further employ a new loss function to normalize logits to overcome floating-point-based gradient masking. We show that not only is it more efficient, but it is also a stronger adversary than the current state-of-the-art when examined across a wide range of defense mechanisms. This suggests that adversarial attacks/defenses could be contingent on the effective use of the defender's hidden components, and robustness evaluation should no longer view models holistically.