GPT-who: An Information Density-based Machine-Generated Text Detector

Saranya Venkatraman, Adaku Uchendu, Dongwon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

The Uniform Information Density (UID) principle posits that humans prefer to spread information evenly during language production. We examine if this UID principle can help capture differences between Large Language Models (LLMs)-generated and human-generated texts. We propose GPT-who, the first psycholinguistically-inspired domain-agnostic statistical detector. This detector employs UID-based features to model the unique statistical signature of each LLM and human author for accurate detection. We evaluate our method using 4 large-scale benchmark datasets and find that GPT-who outperforms state-of-the-art detectors (both statistical- & non-statistical) such as GLTR, GPTZero, DetectGPT, OpenAI detector, and ZeroGPT by over 20% across domains. In addition to better performance, it is computationally inexpensive and utilizes an interpretable representation of text articles. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible. UID-based measures for all datasets and code are available at https://github.com/saranya-venkatraman/gpt-who.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationNAACL 2024 - Findings
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages103-115
Number of pages13
ISBN (Electronic)9798891761193
StatePublished - 2024
Event2024 Findings of the Association for Computational Linguistics: NAACL 2024 - Mexico City, Mexico
Duration: Jun 16 2024Jun 21 2024

Publication series

NameFindings of the Association for Computational Linguistics: NAACL 2024 - Findings

Conference

Conference2024 Findings of the Association for Computational Linguistics: NAACL 2024
Country/TerritoryMexico
CityMexico City
Period6/16/246/21/24

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software

Fingerprint

Dive into the research topics of 'GPT-who: An Information Density-based Machine-Generated Text Detector'. Together they form a unique fingerprint.

Cite this