Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays

Zixin Tang, Janet G. van Hell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

People tend to distribute information evenly during language production, such as when writing an essay, to improve clarity and communication. However, this may pose challenges to non-native speakers. In this study, we compared essays written by second language (L2) learners with various native language (L1) backgrounds to investigate how they distribute information in their non-native L2 written essays. We used information-based metrics, i.e., word surprisal, word entropy, and uniform information density, to estimate how writers distribute information throughout the essay to deliver information. The surprisal and constancy of entropy metrics showed that as writers' L2 proficiency increases, their essays show more native-like patterns will be in the essay, indicating more native-like mechanisms in delivering informative but less surprising content.In contrast, the uniformity of information density metric showed fewer differences across L2 speakers, regardless of their L1 background and L2 proficiency, suggesting that distributing information evenly is a more universal mechanism in human language production mechanisms. This work provides a computational approach to investigate language diversity, variation, and L2 acquisition via human language production.

Original languageEnglish (US)
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
EditorsYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
PublisherAssociation for Computational Linguistics (ACL)
Pages12868-12879
Number of pages12
ISBN (Electronic)9798891761643
DOIs
StatePublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, United States
Duration: Nov 12 2024Nov 16 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period11/12/2411/16/24

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays'. Together they form a unique fingerprint.

Cite this