TY - GEN
T1 - Forget to Flourish
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
AU - Rashid, Md Rafi Ur
AU - Liu, Jing
AU - Koike-Akino, Toshiaki
AU - Wang, Ye
AU - Mehnaz, Shagufta
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificia Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.
AB - Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.
UR - https://www.scopus.com/pages/publications/105004296099
UR - https://www.scopus.com/pages/publications/105004296099#tab=citedBy
U2 - 10.1609/aaai.v39i19.34218
DO - 10.1609/aaai.v39i19.34218
M3 - Conference contribution
AN - SCOPUS:105004296099
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 20139
EP - 20147
BT - Special Track on AI Alignment
A2 - Walsh, Toby
A2 - Shah, Julie
A2 - Kolter, Zico
PB - Association for the Advancement of Artificial Intelligence
Y2 - 25 February 2025 through 4 March 2025
ER -