TY - GEN
T1 - TEXTSHIELD
T2 - 29th USENIX Security Symposium
AU - Li, Jinfeng
AU - Du, Tianyu
AU - Ji, Shouling
AU - Zhang, Rong
AU - Lu, Quan
AU - Yang, Min
AU - Wang, Ting
N1 - Publisher Copyright:
© 2020 by The USENIX Association. All Rights Reserved.
PY - 2020
Y1 - 2020
N2 - Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language. In this paper, we bridge this striking gap by presenting TEXTSHIELD, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TEXTSHIELD differs from previous work in several key aspects: (i) generic - it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust - it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate - it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.
AB - Text-based toxic content detection is an important tool for reducing harmful interactions in online social media environments. Yet, its underlying mechanism, deep learning-based text classification (DLTC), is inherently vulnerable to maliciously crafted adversarial texts. To mitigate such vulnerabilities, intensive research has been conducted on strengthening English-based DLTC models. However, the existing defenses are not effective for Chinese-based DLTC models, due to the unique sparseness, diversity, and variation of the Chinese language. In this paper, we bridge this striking gap by presenting TEXTSHIELD, a new adversarial defense framework specifically designed for Chinese-based DLTC models. TEXTSHIELD differs from previous work in several key aspects: (i) generic - it applies to any Chinese-based DLTC models without requiring re-training; (ii) robust - it significantly reduces the attack success rate even under the setting of adaptive attacks; and (iii) accurate - it has little impact on the performance of DLTC models over legitimate inputs. Extensive evaluations show that it outperforms both existing methods and the industry-leading platforms. Future work will explore its applicability in broader practical tasks.
UR - http://www.scopus.com/inward/record.url?scp=85091922570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091922570&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85091922570
T3 - Proceedings of the 29th USENIX Security Symposium
SP - 1381
EP - 1398
BT - Proceedings of the 29th USENIX Security Symposium
PB - USENIX Association
Y2 - 12 August 2020 through 14 August 2020
ER -