Deep neural networks (DNNs) have achieved state-of-the-art performance in numerous tasks involving complex analysis of raw data, such as self-driving systems and biometric recognition systems. However, recent works have shown that DNNs are under threat from adversarial example attacks. The adversary can easily change the outputs of DNNs by adding small well-designed perturbations to inputs. Adversarial example detection is fundamental for robust DNN-based services. From a human-centric perspective, this paper divides image features into dominant features comprehensible to humans and recessive features incomprehensible to humans yet exploited by DNNs. Based on this perspective, the paper proposes a new viewpoint that imperceptible adversarial examples are the product of recessive features misleading neural networks, and that the adversarial attack enriches these recessive features. The imperceptibility of the adversarial examples indicates that the perturbations enrich recessive features but hardly affect dominant features. Therefore, adversarial examples are sensitive to filtering out recessive features, while benign examples are immune to such operations. Inspired by this idea, we propose a label-only adversarial detector that is referred to as a feature-filter. The feature-filter utilizes the discrete cosine transform (DCT) to approximately separate recessive features from dominant features and obtain a filtered image. A comprehensive user study demonstrates that the DCT-based filter can reliably filter out recessive features from the test image. By comparing only the DNN's prediction labels on the input and its filtered version, the feature-filter can detect imperceptible adversarial examples in real time with high accuracy and few false-positives.
All Science Journal Classification (ASJC) codes