Abstract
Question-asking, an essential yet understudied activity, holds significant implications for learning, creativity, and cognitive development. Research shows that asking complex, open-ended questions is better for learning than closed ones. Previous research has explored open-ended question complexity through Bloom’s taxonomy, but the measurement of complexity remains challenging. Recent advancements in natural language processing have enabled automated scoring of psychological tasks aligned to human-ratings. However, automatic assessment of open-ended questions remains understudied. We address this gap by fine-tuning transformer language models to predict human ratings of open-ended question complexity and comparing them to existing baseline measures (i.e., word count and semantic distance). Using previously collected human-rated responses and Bloom ratings from a creative question-asking task, we trained an encoder model (RoBERTa) and a Large Language Model (Llama-2-7B). Our results reveal that RoBERTa correlated strongly with human ratings of complexity (r = .73), exceeding baseline measures and offering an efficient, lightweight solution suitable for broad adoption. Our fine-tuned LLaMA 2 model achieved stronger performance (r = .84), establishing a new benchmark for predictive accuracy. Thus, we demonstrate how language models can be utilized to automatically score the complexity of open-ended questions. Importantly, LLaMA 2 demonstrates higher accuracy, while RoBERTa provides a replicable, accessible, and cost-effective option for everyday educational and psychological applications. Our work paves the way for automatic assessment of open-ended questions, which are critical across a wide range of cognitive domains.
| Original language | English (US) |
|---|---|
| Article number | 102090 |
| Journal | Thinking Skills and Creativity |
| Volume | 60 |
| DOIs | |
| State | Published - Jun 2026 |
All Science Journal Classification (ASJC) codes
- Education
Fingerprint
Dive into the research topics of 'Automated Scoring of Question Complexity with Transformer Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver