L2 English speaking syntactic complexity: Data preprocessing issues, reliability of automated analysis, and the effects of proficiency, L1 background, and topic

Minjin Kim, Xiaofei Lu

Research output: Contribution to journalArticlepeer-review


The effects of learner- and task-related variables on second language (L2) writing syntactic complexity (SC) have been extensively investigated. However, previous research has rarely assessed the reliability of computational tools for analyzing the SC of L2 spoken production, and we know less about the effects of such variables on L2 speaking SC. Using data from the International Corpus Network of Asian Learners of English, this study explores data preprocessing issues for preparing L2 English speech samples for automated SC analysis, evaluates the reliability of L2 Syntactic Complexity Analyzer on preprocessed L2 English speech samples, and examines the effects of proficiency, first language (L1) background, and topic on L2 speaking SC. Our manual analysis of 30 random speech samples identified several issues that can be addressed through preprocessing to improve the accuracy of automated SC analysis. Results from multiple linear mixed-effects models revealed significant effects of proficiency, L1 background, and topic on the mean length of clause, the number of complex AS-units per AS-unit, and the number of dependent clauses and complex nominals per clause in L2 learners’ spoken production. Our findings have useful implications for L2 speaking pedagogy and assessment as well as future L2 speaking SC research.

Original languageEnglish (US)
Pages (from-to)270-296
Number of pages27
JournalModern Language Journal
Issue number1
StatePublished - Mar 1 2024

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Cite this