TY - GEN
T1 - Revealing backdoors, post-training, in DNN classifiers via novel inference on optimized perturbations inducing group misclassification
AU - Xiang, Zhen
AU - Miller, David J.
AU - Kesidis, George
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020/5
Y1 - 2020/5
N2 - Recently, a special type of data poisoning (DP) attack against deep neural network (DNN) classifiers, known as a backdoor, was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test example. Here, we address the challenging post-training detection of backdoor attacks in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean (unpoisoned) examples from the classification domain. We propose a defense against imperceptible backdoor attacks based on perturbation optimization and novel, robust detection inference. Our method detects whether the trained DNN has been backdoor-attacked and infers the source and target classes involved in an attack. It outperforms alternative defenses for several backdoor patterns, data sets, and attack settings.
AB - Recently, a special type of data poisoning (DP) attack against deep neural network (DNN) classifiers, known as a backdoor, was proposed. These attacks do not seek to degrade classification accuracy, but rather to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test example. Here, we address the challenging post-training detection of backdoor attacks in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean (unpoisoned) examples from the classification domain. We propose a defense against imperceptible backdoor attacks based on perturbation optimization and novel, robust detection inference. Our method detects whether the trained DNN has been backdoor-attacked and infers the source and target classes involved in an attack. It outperforms alternative defenses for several backdoor patterns, data sets, and attack settings.
UR - http://www.scopus.com/inward/record.url?scp=85081759488&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081759488&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9054581
DO - 10.1109/ICASSP40776.2020.9054581
M3 - Conference contribution
AN - SCOPUS:85081759488
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3827
EP - 3831
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -