Toward LLM-Supported Automated Assessment of Critical Thinking Subskills
2510.12915v1
cs.CY, cs.CL, cs.LG
2025-10-17
Авторы:
Marisa C. Peczuh, Nischal Ashok Kumar, Ryan Baker, Blair Lehman, Danielle Eisenberg, Caitlin Mills, Keerthi Chebrolu, Sudhip Nashi, Cadence Young, Brayden Liu, Sherry Lachman, Andrew Lan
Abstract
Critical thinking represents a fundamental competency in today's education
landscape. Developing critical thinking skills through timely assessment and
feedback is crucial; however, there has not been extensive work in the learning
analytics community on defining, measuring, and supporting critical thinking.
In this paper, we investigate the feasibility of measuring core "subskills"
that underlie critical thinking. We ground our work in an authentic task where
students operationalize critical thinking: student-written argumentative
essays. We developed a coding rubric based on an established skills progression
and completed human coding for a corpus of student essays. We then evaluated
three distinct approaches to automated scoring: zero-shot prompting, few-shot
prompting, and supervised fine-tuning, implemented across three large language
models (GPT-5, GPT-5-mini, and ModernBERT). GPT-5 with few-shot prompting
achieved the strongest results and demonstrated particular strength on
subskills with separable, frequent categories, while lower performance was
observed for subskills that required detection of subtle distinctions or rare
categories. Our results underscore critical trade-offs in automated critical
thinking assessment: proprietary models offer superior reliability at higher
cost, while open-source alternatives provide practical accuracy with reduced
sensitivity to minority categories. Our work represents an initial step toward
scalable assessment of higher-order reasoning skills across authentic
educational contexts.
Ссылки и действия
Дополнительные ресурсы: