TY - GEN
T1 - DataSentinel
T2 - 46th IEEE Symposium on Security and Privacy, SP 2025
AU - Liu, Yupei
AU - Jia, Yuqi
AU - Jia, Jinyuan
AU - Song, Dawn
AU - Gong, Neil Zhenqiang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection methods have limited effectiveness against state-of-the-art attacks, let alone adaptive ones. In this work, we propose DataSentinel, a game-theoretic method to detect prompt injection attacks. Specifically, DataSentinel fine-tunes an LLM to detect inputs contaminated with injected prompts that are strategically adapted to evade detection. We formulate this as a minimax optimization problem, with the objective of fine-tuning the LLM to detect strong adaptive attacks. Furthermore, we propose a gradient-based method to solve the minimax optimization problem by alternating between the inner max and outer min problems. Our evaluation results on multiple benchmark datasets and LLMs show that DataSentinel effectively detects both existing and adaptive prompt injection attacks. Our code and data are available at: https://github.com/liu00222/Open-Prompt-Injection.
AB - LLM-integrated applications and agents are vulnerable to prompt injection attacks, where an attacker injects prompts into their inputs to induce attacker-desired outputs. A detection method aims to determine whether a given input is contaminated by an injected prompt. However, existing detection methods have limited effectiveness against state-of-the-art attacks, let alone adaptive ones. In this work, we propose DataSentinel, a game-theoretic method to detect prompt injection attacks. Specifically, DataSentinel fine-tunes an LLM to detect inputs contaminated with injected prompts that are strategically adapted to evade detection. We formulate this as a minimax optimization problem, with the objective of fine-tuning the LLM to detect strong adaptive attacks. Furthermore, we propose a gradient-based method to solve the minimax optimization problem by alternating between the inner max and outer min problems. Our evaluation results on multiple benchmark datasets and LLMs show that DataSentinel effectively detects both existing and adaptive prompt injection attacks. Our code and data are available at: https://github.com/liu00222/Open-Prompt-Injection.
UR - https://www.scopus.com/pages/publications/105009319038
UR - https://www.scopus.com/inward/citedby.url?scp=105009319038&partnerID=8YFLogxK
U2 - 10.1109/SP61157.2025.00250
DO - 10.1109/SP61157.2025.00250
M3 - Conference contribution
AN - SCOPUS:105009319038
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 2190
EP - 2208
BT - Proceedings - 46th IEEE Symposium on Security and Privacy, SP 2025
A2 - Blanton, Marina
A2 - Enck, William
A2 - Nita-Rotaru, Cristina
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 May 2025 through 15 May 2025
ER -