VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
2510.22798v1
cs.CL, cs.LG
2025-10-29
Авторы:
Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, Taehwan Kim
Abstract
Automatically assessing handwritten mathematical solutions is an important
problem in educational technology with practical applications, but it remains a
significant challenge due to the diverse formats, unstructured layouts, and
symbolic complexity of student work. To address this challenge, we introduce
VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics
Expressions-designed to assess open-form handwritten math responses with high
accuracy and interpretable reasoning traces. VEHME integrates a two-phase
training pipeline: (i) supervised fine-tuning using structured reasoning data,
and (ii) reinforcement learning that aligns model outputs with
multi-dimensional grading objectives, including correctness, reasoning depth,
and error localization. To enhance spatial understanding, we propose an
Expression-Aware Visual Prompting Module, trained on our synthesized multi-line
math expressions dataset to robustly guide attention in visually heterogeneous
inputs. Evaluated on AIHub and FERMAT datasets, VEHME achieves state-of-the-art
performance among open-source models and approaches the accuracy of proprietary
systems, demonstrating its potential as a scalable and accessible tool for
automated math assessment. Our training and experiment code is publicly
available at our GitHub repository.
Ссылки и действия
Дополнительные ресурсы: