Backdoor Inversion in Neural-Activation Space

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There are a variety of defenses against backdoor attacks planted in deep neural network (DNN) classifiers via poisoning of the training set. Backdoor-agnostic methods seek to detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while inversion methods explicitly assume one. We describe a new detector that: relies on embedded feature representations (neural-activation space) to estimate (invert) the backdoor and to identify its target class; can operate without access to the training set; and is highly effective for various incorporation mechanisms. Our approach is evaluated - and found favorable - in comparison with an array of published defenses for a variety of attacks.

Original languageEnglish (US)
Title of host publication35th IEEE International Workshop on Machine Learning for Signal Processing
Subtitle of host publicationSignal Processing in the Age of Lorge Language Models, MLSP 2025
PublisherIEEE Computer Society
ISBN (Electronic)9798331570293
DOIs
StatePublished - 2025
Event35th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2025 - Istanbul, Turkey
Duration: Aug 31 2025Sep 3 2025

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference35th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2025
Country/TerritoryTurkey
CityIstanbul
Period8/31/259/3/25

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Backdoor Inversion in Neural-Activation Space'. Together they form a unique fingerprint.

Cite this