Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Backdoor data poisoning (a.k.a. Trojan attack) is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the backdoor pattern is present; 2) maintain high classification accuracy for backdoor-free test images. In this paper, we make a breakthrough in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the classifier training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based reverse engineering defense that jointly: 1) detects whether the training set is poisoned; 2) if so, accurately identifies the target class and the training images with the backdoor pattern embedded; and 3) additionally, reverse engineers an estimate of the backdoor pattern used by the attacker. In benchmark experiments on CIFAR-10 (as well as four other data sets), considering a variety of attacks, our defense achieves a new state-of-the-art by reducing the attack success rate to no more than 4.9% after removing detected suspicious training images.

Original languageEnglish (US)
Article number102280
JournalComputers and Security
Volume106
DOIs
StatePublished - Jul 2021

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Law

Fingerprint

Dive into the research topics of 'Reverse engineering imperceptible backdoor attacks on deep neural networks for detection and training set cleansing'. Together they form a unique fingerprint.

Cite this