TY - JOUR
T1 - Feature-filter
T2 - Detecting adversarial examples by filtering out recessive features
AU - Liu, Hui
AU - Zhao, Bo
AU - Ji, Minzhi
AU - Peng, Yuefeng
AU - Guo, Jiabao
AU - Liu, Peng
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/7
Y1 - 2022/7
N2 - Deep neural networks (DNNs) have achieved state-of-the-art performance in numerous tasks involving complex analysis of raw data, such as self-driving systems and biometric recognition systems. However, recent works have shown that DNNs are under threat from adversarial example attacks. The adversary can easily change the outputs of DNNs by adding small well-designed perturbations to inputs. Adversarial example detection is fundamental for robust DNN-based services. From a human-centric perspective, this paper divides image features into dominant features comprehensible to humans and recessive features incomprehensible to humans yet exploited by DNNs. Based on this perspective, the paper proposes a new viewpoint that imperceptible adversarial examples are the product of recessive features misleading neural networks, and that the adversarial attack enriches these recessive features. The imperceptibility of the adversarial examples indicates that the perturbations enrich recessive features but hardly affect dominant features. Therefore, adversarial examples are sensitive to filtering out recessive features, while benign examples are immune to such operations. Inspired by this idea, we propose a label-only adversarial detector that is referred to as a feature-filter. The feature-filter utilizes the discrete cosine transform (DCT) to approximately separate recessive features from dominant features and obtain a filtered image. A comprehensive user study demonstrates that the DCT-based filter can reliably filter out recessive features from the test image. By comparing only the DNN's prediction labels on the input and its filtered version, the feature-filter can detect imperceptible adversarial examples in real time with high accuracy and few false-positives.
AB - Deep neural networks (DNNs) have achieved state-of-the-art performance in numerous tasks involving complex analysis of raw data, such as self-driving systems and biometric recognition systems. However, recent works have shown that DNNs are under threat from adversarial example attacks. The adversary can easily change the outputs of DNNs by adding small well-designed perturbations to inputs. Adversarial example detection is fundamental for robust DNN-based services. From a human-centric perspective, this paper divides image features into dominant features comprehensible to humans and recessive features incomprehensible to humans yet exploited by DNNs. Based on this perspective, the paper proposes a new viewpoint that imperceptible adversarial examples are the product of recessive features misleading neural networks, and that the adversarial attack enriches these recessive features. The imperceptibility of the adversarial examples indicates that the perturbations enrich recessive features but hardly affect dominant features. Therefore, adversarial examples are sensitive to filtering out recessive features, while benign examples are immune to such operations. Inspired by this idea, we propose a label-only adversarial detector that is referred to as a feature-filter. The feature-filter utilizes the discrete cosine transform (DCT) to approximately separate recessive features from dominant features and obtain a filtered image. A comprehensive user study demonstrates that the DCT-based filter can reliably filter out recessive features from the test image. By comparing only the DNN's prediction labels on the input and its filtered version, the feature-filter can detect imperceptible adversarial examples in real time with high accuracy and few false-positives.
UR - http://www.scopus.com/inward/record.url?scp=85130572342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130572342&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2022.109027
DO - 10.1016/j.asoc.2022.109027
M3 - Article
AN - SCOPUS:85130572342
SN - 1568-4946
VL - 124
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 109027
ER -