TY - GEN
T1 - PAT
T2 - 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023
AU - Ye, Muchao
AU - Chen, Jinghui
AU - Miao, Chenglin
AU - Liu, Han
AU - Wang, Ting
AU - Ma, Fenglong
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/4
Y1 - 2023/8/4
N2 - Despite a plethora of prior explorations, conducting text adversarial attacks in practical settings is still challenging with the following constraints: black box - the inner structure of the victim model is unknown; hard label - the attacker only has access to the top-1 prediction results; and semantic preservation - the perturbation needs to preserve the original semantics. In this paper, we present PAT, a novel adversarial attack method employed under all these constraints. Specifically, PAT explicitly models the adversarial and non-adversarial prototypes and incorporates them to measure semantic changes for replacement selection in the hard-label black-box setting to generate high-quality samples. In each iteration, PAT finds original words that can be replaced back and selects better candidate words for perturbed positions in a geometry-aware manner guided by this estimation, which maximally improves the perturbation construction and minimally impacts the original semantics. Extensive evaluation with benchmark datasets and state-of-the-art models shows that PAT outperforms existing text adversarial attacks in terms of both attack effectiveness and semantic preservation. Moreover, we validate the efficacy of PAT against industry-leading natural language processing platforms in real-world settings.
AB - Despite a plethora of prior explorations, conducting text adversarial attacks in practical settings is still challenging with the following constraints: black box - the inner structure of the victim model is unknown; hard label - the attacker only has access to the top-1 prediction results; and semantic preservation - the perturbation needs to preserve the original semantics. In this paper, we present PAT, a novel adversarial attack method employed under all these constraints. Specifically, PAT explicitly models the adversarial and non-adversarial prototypes and incorporates them to measure semantic changes for replacement selection in the hard-label black-box setting to generate high-quality samples. In each iteration, PAT finds original words that can be replaced back and selects better candidate words for perturbed positions in a geometry-aware manner guided by this estimation, which maximally improves the perturbation construction and minimally impacts the original semantics. Extensive evaluation with benchmark datasets and state-of-the-art models shows that PAT outperforms existing text adversarial attacks in terms of both attack effectiveness and semantic preservation. Moreover, we validate the efficacy of PAT against industry-leading natural language processing platforms in real-world settings.
UR - http://www.scopus.com/inward/record.url?scp=85171326441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171326441&partnerID=8YFLogxK
U2 - 10.1145/3580305.3599461
DO - 10.1145/3580305.3599461
M3 - Conference contribution
AN - SCOPUS:85171326441
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 3093
EP - 3104
BT - KDD 2023 - Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 6 August 2023 through 10 August 2023
ER -