TY - GEN
T1 - The limitations of deep learning in adversarial settings
AU - Papernot, Nicolas
AU - Mcdaniel, Patrick
AU - Jha, Somesh
AU - Fredrikson, Matt
AU - Celik, Z. Berkay
AU - Swami, Ananthram
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/9
Y1 - 2016/5/9
N2 - Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
AB - Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.
UR - http://www.scopus.com/inward/record.url?scp=84978047763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978047763&partnerID=8YFLogxK
U2 - 10.1109/EuroSP.2016.36
DO - 10.1109/EuroSP.2016.36
M3 - Conference contribution
AN - SCOPUS:84978047763
T3 - Proceedings - 2016 IEEE European Symposium on Security and Privacy, EURO S and P 2016
SP - 372
EP - 387
BT - Proceedings - 2016 IEEE European Symposium on Security and Privacy, EURO S and P 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE European Symposium on Security and Privacy, EURO S and P 2016
Y2 - 21 March 2016 through 24 March 2016
ER -