This study attempts to construct automated scoring models for Chinese EFL (English as a Foreign Language) learners’ English-to-Chinese (E-C) translations in large-scale exams. Our data consisted of 900 human-scored translated texts of three source texts–an expository text, a narrative text and a mixed narrative-argumentative text–with 300 for each source text. Text features were extracted using technologies such as n-gram matching, word alignment and Latent Semantic Analysis. Computer scoring models were constructed using multiple linear regression analysis with text features as independent variables and human-assigned scores as the dependent variable. To determine the number of training texts required to yield the most optimal results, five scoring models were developed with a training set of 50, 100, 130, 150 and 180 texts of each text type, respectively. Results indicated that the correlation coefficients between the model-computed and human-assigned scores were above 0.8 for all five models. The model trained with 130 translated texts performed the best on expository and narrative texts, while that trained with 100 translated texts performed the best on mixed narrative-argumentative texts. Therefore, it is concluded that the text features extracted in this study are effective and that the finalized models can produce reliable scores for Chinese EFL learners’ E-C translations.
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language