TY - GEN
T1 - MULTITuDE
T2 - 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
AU - Macko, Dominik
AU - Moro, Robert
AU - Uchendu, Adaku
AU - Lucas, Jason Samuel
AU - Yamashita, Michiharu
AU - Pikuliak, Matúš
AU - Srba, Ivan
AU - Le, Thai
AU - Lee, Dongwon
AU - Simko, Jakub
AU - Bielikova, Maria
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE1, a novel benchmarking dataset for multilingual machine-generated text detection comprising of 74,081 authentic and machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors. Considering the multilinguality, we evaluate 1) how these detectors generalize to unseen languages (linguistically similar as well as dissimilar) and unseen LLMs and 2) whether the detectors improve their performance when trained on multiple languages.
AB - There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE1, a novel benchmarking dataset for multilingual machine-generated text detection comprising of 74,081 authentic and machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors. Considering the multilinguality, we evaluate 1) how these detectors generalize to unseen languages (linguistically similar as well as dissimilar) and unseen LLMs and 2) whether the detectors improve their performance when trained on multiple languages.
UR - http://www.scopus.com/inward/record.url?scp=85182833236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182833236&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.emnlp-main.616
DO - 10.18653/v1/2023.emnlp-main.616
M3 - Conference contribution
AN - SCOPUS:85182833236
T3 - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 9960
EP - 9987
BT - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -