A Novel Framework for Augmenting Rating Scale Tests with LLM-Scored Text Data
2510.08663v1
cs.CL, cs.AI, cs.CY
2025-10-14
Авторы:
Joe Watson, Ivan O'Conner, Chia-Wen Chen, Luning Sun, Fang Luo, David Stillwell
Abstract
Psychological assessments typically rely on structured rating scales, which
cannot incorporate the rich nuance of a respondent's natural language. This
study leverages recent LLM advances to harness qualitative data within a novel
conceptual framework, combining LLM-scored text and traditional rating-scale
items to create an augmented test. We demonstrate this approach using
depression as a case study, developing and assessing the framework on a
real-world sample of upper secondary students (n=693) and corresponding
synthetic dataset (n=3,000). On held-out test sets, augmented tests achieved
statistically significant improvements in measurement precision and accuracy.
The information gain from the LLM items was equivalent to adding between 6.3
(real data) and 16.0 (synthetic data) items to the original 19-item test. Our
approach marks a conceptual shift in automated scoring that bypasses its
typical bottlenecks: instead of relying on pre-labelled data or complex
expert-created rubrics, we empirically select the most informative LLM scoring
instructions based on calculations of item information. This framework provides
a scalable approach for leveraging the growing stream of transcribed text to
enhance traditional psychometric measures, and we discuss its potential utility
in clinical health and beyond.
Ссылки и действия
Дополнительные ресурсы: