TY - GEN
T1 - Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
AU - Wang, Hang
AU - Xiang, Zhen
AU - Miller, David Jonathan
AU - Kesidis, George
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting/abnormally large activations in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.
AB - Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting/abnormally large activations in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.
UR - http://www.scopus.com/inward/record.url?scp=85210568318&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85210568318&partnerID=8YFLogxK
U2 - 10.1109/MLSP58920.2024.10734765
DO - 10.1109/MLSP58920.2024.10734765
M3 - Conference contribution
AN - SCOPUS:85210568318
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 34th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2024 - Proceedings
PB - IEEE Computer Society
T2 - 34th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2024
Y2 - 22 September 2024 through 25 September 2024
ER -