Abstract
Deep neural networks (DNNs) have been shown vulnerable to Test-Time Evasion attacks (TTEs, or adversarial examples), which, by making small changes to the input, alter the DNN's decision. We propose an unsupervised attack detector for DNN classifiers based on class-conditional Generative Adversarial Networks (GANs). We model the distribution of clean data conditioned on the predicted class label by an Auxiliary Classifier GAN (AC-GAN). Given a test sample and its predicted class, three detection statistics are calculated based on the AC-GAN generator and discriminator. Experiments on image classification datasets under various TTE attacks show that our method outperforms previous detection methods. We also investigate the effectiveness of anomaly detection using different DNN layers (input features or internal-layer features) and demonstrate, as one might expect, that anomalies are harder to detect using features closer to the DNN's output layer. Finally, our approach is also investigated for more general out-of-distribution detection.
| Original language | English (US) |
|---|---|
| Article number | 102956 |
| Journal | Computers and Security |
| Volume | 124 |
| DOIs | |
| State | Published - Jan 2023 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- Law
Fingerprint
Dive into the research topics of 'Anomaly detection of adversarial examples using class-conditional generative adversarial networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver