Semantic-Preserving Adversarial Example Attack against BERT

Chongyang Gao, Kang Gu, Soroush Vosoughi, Shagufta Mehnaz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Adversarial example attacks against textual data have been drawing increasing attention in both the natural language processing (NLP) and security domains. However, most of the existing attacks overlook the importance of semantic similarity and yield easily recognizable adversarial samples. As a result, the defense methods developed in response to these attacks remain vulnerable and could be evaded by advanced adversarial examples that maintain high semantic similarity with the original, non-adversarial text. Hence, this paper aims to investigate the extent of textual adversarial examples in maintaining such high semantic similarity. We propose Reinforce attack, a reinforcement learning-based framework to generate adversarial text that preserves high semantic similarity with the original text. In particular, the attack process is controlled by a reward function rather than heuristics, as in previous methods, to encourage higher semantic similarity and lower query costs. Through automatic and human evaluations, we show that our generated adversarial texts preserve significantly higher semantic similarity than state-of-the-art attacks while achieving similar attack success rates (outperforming at times), thus uncovering novel challenges for effective defenses.

Original languageEnglish (US)
Title of host publicationTrustNLP 2024 - 4th Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop
EditorsAnaelia Ovalle, Kai-Wei Chang, Yang Trista Cao, Ninareh Mehrabi, Jieyu Zhao, Aram Galstyan, Jwala Dhamala, Anoop Kumar, Rahul Gupta
PublisherAssociation for Computational Linguistics (ACL)
Pages202-207
Number of pages6
ISBN (Electronic)9798891761131
DOIs
StatePublished - 2024
Event4th Workshop on Trustworthy Natural Language Processing, TrustNLP 2024 - Mexico City, Mexico
Duration: Jun 21 2024 → …

Publication series

NameTrustNLP 2024 - 4th Workshop on Trustworthy Natural Language Processing, Proceedings of the Workshop

Conference

Conference4th Workshop on Trustworthy Natural Language Processing, TrustNLP 2024
Country/TerritoryMexico
CityMexico City
Period6/21/24 → …

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computational Theory and Mathematics
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Semantic-Preserving Adversarial Example Attack against BERT'. Together they form a unique fingerprint.

Cite this