TY - GEN
T1 - Deep Neural Network Piration without Accuracy Loss
AU - Ray, Aritra
AU - Jia, Jinyuan
AU - Saha, Sohini
AU - Chaudhuri, Jayeeta
AU - Gong, Neil Zhenqiang
AU - Chakrabarty, Krishnendu
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - A deep neural network (DNN) classifier is often viewed as the intellectual property of a model owner due to the huge resources required to train it. To protect intellectual property, the model owner can embed a watermark into the DNN classifier (called target classifier) such that it outputs pre-determined labels (called trigger labels) for pre-determined inputs (called trigger inputs). Given the black-box access to a suspect classifier, the model owner can verify whether the suspect classifier is pirated version of its classifier by first querying the suspect classifier for trigger inputs and then checking whether the predicted labels match with the trigger labels. Many studies showed that an attacker can pirate the target classifier (called pirated classifier) via retraining or fine-tuning the target classifier to remove its watermark. However, they sacrifice the accuracy of the pirated classifier, which is undesired for critical applications such as finance and healthcare. In our work, we propose a new attack without sacrificing the accuracy of the pirated classifier for in-distribution testing inputs while preventing the detection from the model owner. Our idea is that an attacker can detect the trigger inputs in the inference stage of the pirated classifier. In particular, given a testing input, we let the pirated classifier return a random label if the input is detected as a trigger input. Otherwise, the pirated classifier predicts the same label as the target classifier. We evaluate our attack on benchmark datasets and find that our attack can effectively identify the trigger inputs. Our attack reveals that the intellectual property of a model owner can be violated with existing watermarking techniques, highlighting the need for new techniques.
AB - A deep neural network (DNN) classifier is often viewed as the intellectual property of a model owner due to the huge resources required to train it. To protect intellectual property, the model owner can embed a watermark into the DNN classifier (called target classifier) such that it outputs pre-determined labels (called trigger labels) for pre-determined inputs (called trigger inputs). Given the black-box access to a suspect classifier, the model owner can verify whether the suspect classifier is pirated version of its classifier by first querying the suspect classifier for trigger inputs and then checking whether the predicted labels match with the trigger labels. Many studies showed that an attacker can pirate the target classifier (called pirated classifier) via retraining or fine-tuning the target classifier to remove its watermark. However, they sacrifice the accuracy of the pirated classifier, which is undesired for critical applications such as finance and healthcare. In our work, we propose a new attack without sacrificing the accuracy of the pirated classifier for in-distribution testing inputs while preventing the detection from the model owner. Our idea is that an attacker can detect the trigger inputs in the inference stage of the pirated classifier. In particular, given a testing input, we let the pirated classifier return a random label if the input is detected as a trigger input. Otherwise, the pirated classifier predicts the same label as the target classifier. We evaluate our attack on benchmark datasets and find that our attack can effectively identify the trigger inputs. Our attack reveals that the intellectual property of a model owner can be violated with existing watermarking techniques, highlighting the need for new techniques.
UR - http://www.scopus.com/inward/record.url?scp=85152212908&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152212908&partnerID=8YFLogxK
U2 - 10.1109/ICMLA55696.2022.00172
DO - 10.1109/ICMLA55696.2022.00172
M3 - Conference contribution
AN - SCOPUS:85152212908
T3 - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
SP - 1032
EP - 1038
BT - Proceedings - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
A2 - Wani, M. Arif
A2 - Kantardzic, Mehmed
A2 - Palade, Vasile
A2 - Neagu, Daniel
A2 - Yang, Longzhi
A2 - Chan, Kit-Yan
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st IEEE International Conference on Machine Learning and Applications, ICMLA 2022
Y2 - 12 December 2022 through 14 December 2022
ER -