Code-Switching Metrics Using Intonation Units

Rebecca Pattichis, Dora LaCasse, Sonya Trawick, Rena Torres Cacoullos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Code-switching (CS) metrics in NLP that are based on word-level units are misaligned with true bilingual CS behavior. Crucially, CS is not equally likely between any two words, but follows syntactic and prosodic rules. We adapt two metrics, multilinguality and CS probability, and apply them to transcribed bilingual speech, for the first time putting forward Intonation Units (IUs) - prosodic speech segments - as basic tokens for NLP tasks. In addition, we calculate these two metrics separately for distinct mixing types: alternating-language multi-word strings and single-word incorporations from one language into another. Results indicate that individual differences according to the two CS metrics are independent. However, there is a shared tendency among bilinguals for multi-word CS to occur across, rather than within, IU boundaries. That is, bilinguals tend to prosodically separate their two languages. This constraint is blurred when metric calculations do not distinguish multi-word and single-word items. These results call for a reconsideration of units of analysis in future development of CS datasets for NLP tasks.

Original languageEnglish (US)
Title of host publicationEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsHouda Bouamor, Juan Pino, Kalika Bali
PublisherAssociation for Computational Linguistics (ACL)
Pages16840-16849
Number of pages10
ISBN (Electronic)9798891760608
StatePublished - 2023
Event2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore
Duration: Dec 6 2023Dec 10 2023

Publication series

NameEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Country/TerritorySingapore
CityHybrid, Singapore
Period12/6/2312/10/23

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Cite this