TY - GEN
T1 - A Benchmark Study of Backdoor Data Poisoning Defenses for Deep Neural Network Classifiers and A Novel Defense
AU - Xiang, Zhen
AU - Miller, David J.
AU - Kesidis, George
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - While data poisoning attacks on classifiers were originally proposed to degrade a classifier's usability, there has been strong recent interest in backdoor data poisoning attacks, where the classifier learns to classify to a target class whenever a backdoor pattern (e.g., a watermark or innocuous pattern) is added to an example from some class other than the target class. In this paper, we conduct a benchmark experimental study to assess the effectiveness of backdoor attacks against deep neural network (DNN) classifiers for images (CIFAR-10 domain), as well as of anomaly detection defenses against these attacks, assuming the defender has access to the (poisoned) training set. We also propose a novel defense scheme (cluster impurity (CI)) based on two ideas: I) backdoor patterns may cluster in a DNN's (e.g. penultimate) deep layer latent space; ii) image filtering (or additive noise) may remove the backdoor patterns, and thus alter the class decision produced by the DNN. We demonstrate that largely imperceptible single-pixel backdoor attacks are highly successful, with no effect on classifier usability. However, the CI approach is highly effective at detecting these attacks, and more successful than previous backdoor detection methods.
AB - While data poisoning attacks on classifiers were originally proposed to degrade a classifier's usability, there has been strong recent interest in backdoor data poisoning attacks, where the classifier learns to classify to a target class whenever a backdoor pattern (e.g., a watermark or innocuous pattern) is added to an example from some class other than the target class. In this paper, we conduct a benchmark experimental study to assess the effectiveness of backdoor attacks against deep neural network (DNN) classifiers for images (CIFAR-10 domain), as well as of anomaly detection defenses against these attacks, assuming the defender has access to the (poisoned) training set. We also propose a novel defense scheme (cluster impurity (CI)) based on two ideas: I) backdoor patterns may cluster in a DNN's (e.g. penultimate) deep layer latent space; ii) image filtering (or additive noise) may remove the backdoor patterns, and thus alter the class decision produced by the DNN. We demonstrate that largely imperceptible single-pixel backdoor attacks are highly successful, with no effect on classifier usability. However, the CI approach is highly effective at detecting these attacks, and more successful than previous backdoor detection methods.
UR - http://www.scopus.com/inward/record.url?scp=85077702345&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077702345&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2019.8918908
DO - 10.1109/MLSP.2019.8918908
M3 - Conference contribution
AN - SCOPUS:85077702345
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing, MLSP 2019
PB - IEEE Computer Society
T2 - 29th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2019
Y2 - 13 October 2019 through 16 October 2019
ER -