TY - GEN
T1 - Revealing perceptible backdoors in DNNs, without the training set, via the maximum achievable misclassification fraction statistic
AU - Xiang, Zhen
AU - Miller, David J.
AU - Wang, Hang
AU - Kesidis, George
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - Recently, a backdoor data poisoning attack was proposed, which adds mislabeled examples to the training set, with an embedded backdoor pattern, aiming to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test sample. We address post-training detection of innocuous perceptible backdoors in DNN image classifiers, wherein the defender does not have access to the poisoned training set. This problem is challenging because without the poisoned training set, we have no hint about the actual backdoor pattern used during training. We identify two properties of perceptible backdoor patterns - spatial invariance and robustness - based upon which we propose a novel detector using the maximum achievable misclassification fraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the source and target classes. Our detector outperforms other existing detectors experimentally.
AB - Recently, a backdoor data poisoning attack was proposed, which adds mislabeled examples to the training set, with an embedded backdoor pattern, aiming to have the classifier learn to classify to a target class whenever the backdoor pattern is present in a test sample. We address post-training detection of innocuous perceptible backdoors in DNN image classifiers, wherein the defender does not have access to the poisoned training set. This problem is challenging because without the poisoned training set, we have no hint about the actual backdoor pattern used during training. We identify two properties of perceptible backdoor patterns - spatial invariance and robustness - based upon which we propose a novel detector using the maximum achievable misclassification fraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the source and target classes. Our detector outperforms other existing detectors experimentally.
UR - http://www.scopus.com/inward/record.url?scp=85096464113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096464113&partnerID=8YFLogxK
U2 - 10.1109/MLSP49062.2020.9231861
DO - 10.1109/MLSP49062.2020.9231861
M3 - Conference contribution
AN - SCOPUS:85096464113
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing, MLSP 2020
PB - IEEE Computer Society
T2 - 30th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020
Y2 - 21 September 2020 through 24 September 2020
ER -