TY - GEN
T1 - UPTON
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Wang, Ziyao
AU - Le, Thai
AU - Lee, Dongwon
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Consider a scenario where an author-e.g., activist, whistle-blower, with many public writings wishes to write “anonymously" when attackers may have already built an authorship attribution (AA) model based off of public writings including those of the author. To enable her wish, we ask a question “can one make the publicly released writings, T, unattributable so that AA models trained on T cannot attribute its authorship well?" Toward this question, we present a novel solution, UPTON, that exploits black-box data poisoning methods to weaken the authorship features in training samples and make released texts unlearnable. It is different from previous obfuscation works-e.g., adversarial attacks that modify test samples or backdoor works that only change the model outputs when triggering words occur. Using four authorship datasets (IMDb10, IMDb64, Enron and WJO), we present empirical validation where UPTON successfully downgrades the accuracy of AA models to the impractical level (∼35%) while keeping texts still readable (semantic similarity>0.9). UPTON remains effective to AA models that are already trained on available clean writings of authors.
AB - Consider a scenario where an author-e.g., activist, whistle-blower, with many public writings wishes to write “anonymously" when attackers may have already built an authorship attribution (AA) model based off of public writings including those of the author. To enable her wish, we ask a question “can one make the publicly released writings, T, unattributable so that AA models trained on T cannot attribute its authorship well?" Toward this question, we present a novel solution, UPTON, that exploits black-box data poisoning methods to weaken the authorship features in training samples and make released texts unlearnable. It is different from previous obfuscation works-e.g., adversarial attacks that modify test samples or backdoor works that only change the model outputs when triggering words occur. Using four authorship datasets (IMDb10, IMDb64, Enron and WJO), we present empirical validation where UPTON successfully downgrades the accuracy of AA models to the impractical level (∼35%) while keeping texts still readable (semantic similarity>0.9). UPTON remains effective to AA models that are already trained on available clean writings of authors.
UR - http://www.scopus.com/inward/record.url?scp=85183308547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183308547&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85183308547
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 11952
EP - 11965
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -