Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Hang Wang, Zhen Xiang, David Jonathan Miller, George Kesidis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting/abnormally large activations in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.

Original languageEnglish (US)
Title of host publication34th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2024 - Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798350372250
DOIs
StatePublished - 2024
Event34th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2024 - London, United Kingdom
Duration: Sep 22 2024Sep 25 2024

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference34th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2024
Country/TerritoryUnited Kingdom
CityLondon
Period9/22/249/25/24

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection'. Together they form a unique fingerprint.

Cite this