Towards inspecting and eliminating trojan backdoors in deep neural networks

Wenbo Guo, Lun Wang, Yan Xu, Xinyu Xing, Min Du, Dawn Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

34 Scopus citations

Abstract

A trojan backdoor is a hidden pattern typically implanted in a deep neural network (DNN). It could be activated and thus forces that infected model to behave abnormally when an input sample with a particular trigger is fed to that model. As such, given a DNN and clean input samples, it is challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this problem. They demonstrate that the proposed techniques have great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption of assuming the availability of the contaminated training database. On the other hand, these techniques can neither accurately detect the existence of trojan backdoors, nor restore high-fidelity triggers, especially when infected models are trained with high-dimensional data, and the triggers pertaining to the trojan vary in size, shape, and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes the detection of a trojan backdoor as solving an optimization objective function. Different from the existing technique which also models trojan detection as an optimization problem, TABOR first designs a new objective function that could guide optimization to identify a trojan backdoor more correctly and accurately. Second, TABOR borrows the idea of interpretable AI to further prune the restored triggers. Last, TABOR designs a new anomaly detection method, which could not only facilitate the identification of intentionally injected triggers but also filter out false alarms (i.e., triggers detected from an uninfected model). We train 112 DNNs on five datasets and infect these models with two existing trojan attacks. We evaluate TABOR by using these infected models, and demonstrate that TABOR has much better performance in trigger restoration, trojan detection, and elimination than Neural Cleanse, the state-of-the-art trojan detection technique.

Original languageEnglish (US)
Title of host publicationProceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
EditorsClaudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages162-171
Number of pages10
ISBN (Electronic)9781728183169
DOIs
StatePublished - Nov 2020
Event20th IEEE International Conference on Data Mining, ICDM 2020 - Virtual, Sorrento, Italy
Duration: Nov 17 2020Nov 20 2020

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2020-November
ISSN (Print)1550-4786

Conference

Conference20th IEEE International Conference on Data Mining, ICDM 2020
Country/TerritoryItaly
CityVirtual, Sorrento
Period11/17/2011/20/20

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Towards inspecting and eliminating trojan backdoors in deep neural networks'. Together they form a unique fingerprint.

Cite this