Adversarial examples are human-imperceptible perturbations to inputs to machine learning models. While attacking machine learning models, adversarial examples cause the model to make a false positive or a false negative. So far, two representative defense architectures have shown a significant effect: (1) model retraining architecture; and (2) input transformation architecture. However, previous defense methods belonging to these two architectures do not produce good outputs for every input, i.e., adversarial examples and legitimate inputs. Specifically, model retraining methods generate false negatives for unknown adversarial examples, and input transformation methods generate false positives for legitimate inputs. To produce good-enough outputs for every input, we propose and evaluate a new input transformation architecture based on two-step input transformation. To solve the limitations of the previous two defense methods, we intend to answer the following question: How to maintain the performance of Deep Neural Network (DNN) models for legitimate inputs while providing good robustness against various adversarial examples? From the evaluation results under various conditions, we show that the proposed two-step input transformation architecture provides good robustness to DNN models against state-of-the-art adversarial perturbations, while maintaining the high accuracy even for legitimate inputs.
|Number of pages
|IEEE Transactions on Network Science and Engineering
|Published - Apr 1 2021
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Computer Networks and Communications