Correcting the distribution of batch normalization signals for Trojan mitigation

Research output: Contribution to journalArticlepeer-review

Abstract

Backdoor (Trojan) attacks represent a significant adversarial threat to deep neural networks (DNNs). In such attacks, the presence of an attacker's backdoor trigger causes a test instance to be misclassified into the attacker's chosen target class. Post-training mitigation methods aim to rectify these misclassifications, ensuring that poisoned models correctly classify backdoor-triggered samples. These methods require the defender to have access to a small, clean dataset and the potentially compromised DNN. However, most defenses rely on parameter fine-tuning, making their effectiveness dependent on the dataset size available to the defender. To overcome the limitations of existing approaches, we propose a method that rectifies misclassifications by correcting the altered distribution of internal layer activations of backdoor-triggered instances. Distribution alterations are corrected by applying simple transformations to internal activations. Notably, our method does not modify any trainable parameters of the DNN, yet it achieves generally good mitigation performance against various backdoor attacks and benchmarks. Consequently, our approach demonstrates robustness even with a limited amount of clean data, making it highly practical for real-world applications. The effectiveness of our approach is validated through both theoretical analysis and extensive experimentation. The appendix is provided as an electronic component and can be accessed via the link in the footnote. The source codes can be found in the link at the footnote.

Original languageEnglish (US)
Article number128752
JournalNeurocomputing
Volume614
DOIs
StatePublished - Jan 21 2025

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Correcting the distribution of batch normalization signals for Trojan mitigation'. Together they form a unique fingerprint.

Cite this