T-BFA: Targeted Bit-Flip Adversarial Weight Attack

IEEE Trans Pattern Anal Mach Intell. 2021 Sep 16:PP. doi: 10.1109/TPAMI.2021.3112932. Online ahead of print.

Abstract

Traditional Deep Neural Network (DNN) security is mostly related to the well-known adversarial input example attack.Recently, another dimension of adversarial attack, namely, attack on DNN weight parameters, has been shown to be very powerful. Asa representative one, the Bit-Flip based adversarial weight Attack (BFA) injects an extremely small amount of faults into weight parameters to hijack the executing DNN function. Prior works of BFA focus on un-targeted attacks that can hack all inputs into a random output class by flipping a very small number of weight bits stored in computer memory. This paper proposes the first work oftargetedBFA based (T-BFA) adversarial weight attack on DNNs, which can intentionally mislead selected inputs to a target output class. The objective is achieved by identifying the weight bits that are highly associated with classification of a targeted output through a class-dependent weight bit searching algorithm. Our proposed T-BFA performance is successfully demonstrated on multiple DNN architectures for image classification tasks. For example, by merely flipping 27 out of 88 million weight bits of ResNet-18, our T-BFA can misclassify all the images from Hen class into Goose class (i.e., 100% attack success rate) in ImageNet dataset, while maintaining 59.35% validation accuracy.