TY - GEN
T1 - Towards inspecting and eliminating trojan backdoors in deep neural networks
AU - Guo, Wenbo
AU - Wang, Lun
AU - Xu, Yan
AU - Xing, Xinyu
AU - Du, Min
AU - Song, Dawn
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - A trojan backdoor is a hidden pattern typically implanted in a deep neural network (DNN). It could be activated and thus forces that infected model to behave abnormally when an input sample with a particular trigger is fed to that model. As such, given a DNN and clean input samples, it is challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this problem. They demonstrate that the proposed techniques have great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption of assuming the availability of the contaminated training database. On the other hand, these techniques can neither accurately detect the existence of trojan backdoors, nor restore high-fidelity triggers, especially when infected models are trained with high-dimensional data, and the triggers pertaining to the trojan vary in size, shape, and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes the detection of a trojan backdoor as solving an optimization objective function. Different from the existing technique which also models trojan detection as an optimization problem, TABOR first designs a new objective function that could guide optimization to identify a trojan backdoor more correctly and accurately. Second, TABOR borrows the idea of interpretable AI to further prune the restored triggers. Last, TABOR designs a new anomaly detection method, which could not only facilitate the identification of intentionally injected triggers but also filter out false alarms (i.e., triggers detected from an uninfected model). We train 112 DNNs on five datasets and infect these models with two existing trojan attacks. We evaluate TABOR by using these infected models, and demonstrate that TABOR has much better performance in trigger restoration, trojan detection, and elimination than Neural Cleanse, the state-of-the-art trojan detection technique.
AB - A trojan backdoor is a hidden pattern typically implanted in a deep neural network (DNN). It could be activated and thus forces that infected model to behave abnormally when an input sample with a particular trigger is fed to that model. As such, given a DNN and clean input samples, it is challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this problem. They demonstrate that the proposed techniques have great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption of assuming the availability of the contaminated training database. On the other hand, these techniques can neither accurately detect the existence of trojan backdoors, nor restore high-fidelity triggers, especially when infected models are trained with high-dimensional data, and the triggers pertaining to the trojan vary in size, shape, and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes the detection of a trojan backdoor as solving an optimization objective function. Different from the existing technique which also models trojan detection as an optimization problem, TABOR first designs a new objective function that could guide optimization to identify a trojan backdoor more correctly and accurately. Second, TABOR borrows the idea of interpretable AI to further prune the restored triggers. Last, TABOR designs a new anomaly detection method, which could not only facilitate the identification of intentionally injected triggers but also filter out false alarms (i.e., triggers detected from an uninfected model). We train 112 DNNs on five datasets and infect these models with two existing trojan attacks. We evaluate TABOR by using these infected models, and demonstrate that TABOR has much better performance in trigger restoration, trojan detection, and elimination than Neural Cleanse, the state-of-the-art trojan detection technique.
UR - http://www.scopus.com/inward/record.url?scp=85099837719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099837719&partnerID=8YFLogxK
U2 - 10.1109/ICDM50108.2020.00025
DO - 10.1109/ICDM50108.2020.00025
M3 - Conference contribution
AN - SCOPUS:85099837719
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 162
EP - 171
BT - Proceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
A2 - Plant, Claudia
A2 - Wang, Haixun
A2 - Cuzzocrea, Alfredo
A2 - Zaniolo, Carlo
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Data Mining, ICDM 2020
Y2 - 17 November 2020 through 20 November 2020
ER -