TY - GEN
T1 - TEXTSHIELD
T2 - 29th USENIX Security Symposium
AU - Li, Jinfeng
AU - Du, Tianyu
AU - Ji, Shouling
AU - Zhang, Rong
AU - Lu, Quan
AU - Yang, Min
AU - Wang, Ting
N1 - Funding Information:
We sincerely appreciate the shepherding from David Evans. We would also like to thank the anonymous reviewers for their constructive comments and input to improve our paper. This work was partly supported by the National Key Research and Development Program of China under No. 2018YFB0804102, NSFC under No. 61772466, U1936215, and U1836202, the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under No. LR19F020003, the Provincial Key Research and Development Program of Zhejiang, China under No. 2019C01055, the Ant Financial Research Funding, and the Alibaba-ZJU Joint Research Institute of Frontier Technologies. Ting Wang is partially supported by the National Science Foundation under Grant No. 1910546, 1953813, and 1846151. Min Yang is partially supported by NSFC under No. U1636204 and U1836213. Min Yang is also a member of Shanghai Institute of Intelligent Electronics & Systems, Shanghai Institute for Advanced Communication and Data Science.
Publisher Copyright:
© 2020 by The USENIX Association. All Rights Reserved.
PY - 2020
Y1 - 2020
N2 - Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language. In this paper, we bridge this striking gap by presenting TEXTSHIELD, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TEXTSHIELD differs from previous work in several key aspects: (i) generic - it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust - it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate - it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.
AB - Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language. In this paper, we bridge this striking gap by presenting TEXTSHIELD, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TEXTSHIELD differs from previous work in several key aspects: (i) generic - it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust - it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate - it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.
UR - http://www.scopus.com/inward/record.url?scp=85091922570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091922570&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85091922570
T3 - Proceedings of the 29th USENIX Security Symposium
SP - 1381
EP - 1398
BT - Proceedings of the 29th USENIX Security Symposium
PB - USENIX Association
Y2 - 12 August 2020 through 14 August 2020
ER -