Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. Code available at https://github.com/zhaohan-xi/PLM-prompt-defense.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713899921
StatePublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: Dec 10 2023Dec 16 2023

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period12/10/2312/16/23

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks'. Together they form a unique fingerprint.

Cite this