Anomaly detection of adversarial examples using class-conditional generative adversarial networks

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Deep neural networks (DNNs) have been shown vulnerable to Test-Time Evasion attacks (TTEs, or adversarial examples), which, by making small changes to the input, alter the DNN's decision. We propose an unsupervised attack detector for DNN classifiers based on class-conditional Generative Adversarial Networks (GANs). We model the distribution of clean data conditioned on the predicted class label by an Auxiliary Classifier GAN (AC-GAN). Given a test sample and its predicted class, three detection statistics are calculated based on the AC-GAN generator and discriminator. Experiments on image classification datasets under various TTE attacks show that our method outperforms previous detection methods. We also investigate the effectiveness of anomaly detection using different DNN layers (input features or internal-layer features) and demonstrate, as one might expect, that anomalies are harder to detect using features closer to the DNN's output layer. Finally, our approach is also investigated for more general out-of-distribution detection.

Original languageEnglish (US)
Article number102956
JournalComputers and Security
Volume124
DOIs
StatePublished - Jan 2023

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Law

Fingerprint

Dive into the research topics of 'Anomaly detection of adversarial examples using class-conditional generative adversarial networks'. Together they form a unique fingerprint.

Cite this