TY - GEN
T1 - Anomaly detection of attacks (ADA) on DNN classifiers at test time
AU - Miller, David J.
AU - Wang, Yujia
AU - Kesidis, George
N1 - Funding Information:
This research was supported by an AFOSR DDDAS grant and a Cisco gift.
Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/10/31
Y1 - 2018/10/31
N2 - A significant threat to wide deployment of machine learning-based classifiers is adversarial learning attacks, especially at test-time. Recently there has been significant development in defending against such attacks. Several such works seek to robustify the classifier to make «correct» decisions on perturbed patterns. We argue it is often operationally more important to detect the attack, rather than to «correctly classify» in the face of it (Classification can proceed if no attack is detected). We hypothesize that, even if human-imperceptible, adversarial perturbations are machine-detectable. We propose a purely unsupervised anomaly detector (AD), based on suitable (null hypothesis) density models for the different layers of a deep neural net and a novel decision statistic built upon the Kullback-Leibler divergence. This paper addresses: 1) when is it appropriate to aim to «correctly classify» a perturbed pattern?; 2) What is a good AD detection statistic, one which exploits all likely sources of anomalousness associated with a test-time attack? 3) Where in a deep neural net (DNN) (in an early layer, a middle layer, or at the penultimate layer) will the most anomalous signature manifest? Tested on MNIST and CIFAR-10 image databases under three prominent attack strategies, our approach outperforms previous detection methods, achieving strong ROC AUC detection accuracy on two attacks and substantially better accuracy than previously reported on the third (strongest) attack.
AB - A significant threat to wide deployment of machine learning-based classifiers is adversarial learning attacks, especially at test-time. Recently there has been significant development in defending against such attacks. Several such works seek to robustify the classifier to make «correct» decisions on perturbed patterns. We argue it is often operationally more important to detect the attack, rather than to «correctly classify» in the face of it (Classification can proceed if no attack is detected). We hypothesize that, even if human-imperceptible, adversarial perturbations are machine-detectable. We propose a purely unsupervised anomaly detector (AD), based on suitable (null hypothesis) density models for the different layers of a deep neural net and a novel decision statistic built upon the Kullback-Leibler divergence. This paper addresses: 1) when is it appropriate to aim to «correctly classify» a perturbed pattern?; 2) What is a good AD detection statistic, one which exploits all likely sources of anomalousness associated with a test-time attack? 3) Where in a deep neural net (DNN) (in an early layer, a middle layer, or at the penultimate layer) will the most anomalous signature manifest? Tested on MNIST and CIFAR-10 image databases under three prominent attack strategies, our approach outperforms previous detection methods, achieving strong ROC AUC detection accuracy on two attacks and substantially better accuracy than previously reported on the third (strongest) attack.
UR - http://www.scopus.com/inward/record.url?scp=85057066178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057066178&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2018.8517069
DO - 10.1109/MLSP.2018.8517069
M3 - Conference contribution
AN - SCOPUS:85057066178
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2018 IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2018 - Proceedings
A2 - Pustelnik, Nelly
A2 - Tan, Zheng-Hua
A2 - Ma, Zhanyu
A2 - Larsen, Jan
PB - IEEE Computer Society
T2 - 28th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2018
Y2 - 17 September 2018 through 20 September 2018
ER -