Aligning linguistic complexity with the difficulty of English texts for L2 learners based on CEFR levels

Research output: Contribution to journalArticlepeer-review

Abstract

Selecting appropriate texts for second language (L2) learners is essential for effective education. However, current text difficulty models often inadequately classify materials for L2 learners by proficiency levels. This study addresses this deficiency by employing the Common European Framework of Reference for Languages (CEFR) as its foundational framework. A cohort of expert English-L2 educators classified 1,181 texts from the CommonLit Ease of Readability corpus into CEFR levels. A random forest model was then trained using 24 linguistic complexity features to predict the CEFR levels of English texts for L2 learners. The model achieved 62.6% exact-level accuracy across the six granular CEFR levels and 82.6% across the three overarching levels, outperforming a baseline model based on three existing readability formulas. Additionally, it identified shared and unique linguistic features across different CEFR levels, highlighting the necessity to adjust text classification models to accommodate the distinct linguistic profiles of low- and high-proficiency readers.

Original languageEnglish (US)
JournalStudies in Second Language Acquisition
DOIs
StateAccepted/In press - 2025

All Science Journal Classification (ASJC) codes

  • Education
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Aligning linguistic complexity with the difficulty of English texts for L2 learners based on CEFR levels'. Together they form a unique fingerprint.

Cite this