DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
2510.22340v1
cs.AI, cs.CL, cs.CV, cs.LG
2025-10-29
Авторы:
Changti Wu, Shijie Lian, Zihao Liu, Lei Zhang, Laurence Tianruo Yang, Kai Chen
Abstract
Solid geometry problem solving demands spatial mathematical reasoning that
integrates spatial intelligence and symbolic reasoning. However, most existing
multimodal mathematical reasoning benchmarks focus primarily on 2D plane
geometry, rely on static datasets prone to data contamination and memorization,
and evaluate models solely by final answers, overlooking the reasoning process.
To address these limitations, we introduce DynaSolidGeo, the first dynamic
benchmark for evaluating genuine spatial reasoning in Vision-Language Models
(VLMs). Constructed through a semi-automatic annotation pipeline, DynaSolidGeo
contains 503 expert-curated seed questions that can, in principle, dynamically
generate an unbounded number of diverse multimodal text-visual instances.
Beyond answer accuracy, we incorporate process evaluation based on
expert-annotated reasoning chains to measure logical validity and causal
coherence. Experiments across representative open-source and closed-source VLMs
reveal large performance gaps, severe degradation in dynamic settings, and poor
performance on tasks requiring high-level spatial intelligence, such as mental
rotation and visualization. The code and dataset are available at
\href{https://zgca-ai4edu.github.io/DynaSolidGeo/}{DynaSolidGeo}.