Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
2510.03863v1
cs.AI, cs.CR
2025-10-08
Авторы:
Arina Kharlamova, Bowei He, Chen Ma, Xue Liu
Abstract
Online services rely on CAPTCHAs as a first line of defense against automated
abuse, yet recent advances in multi-modal large language models (MLLMs) have
eroded the effectiveness of conventional designs that focus on text recognition
or 2D image understanding. To address this challenge, we present Spatial
CAPTCHA, a novel human-verification framework that leverages fundamental
differences in spatial reasoning between humans and MLLMs. Unlike existing
CAPTCHAs which rely on low-level perception tasks that are vulnerable to modern
AI, Spatial CAPTCHA generates dynamic questions requiring geometric reasoning,
perspective-taking, occlusion handling, and mental rotation. These skills are
intuitive for humans but difficult for state-of-the-art (SOTA) AI systems. The
system employs a procedural generation pipeline with constraint-based
difficulty control, automated correctness verification, and human-in-the-loop
validation to ensure scalability, robustness, and adaptability. Evaluation on a
corresponding benchmark, Spatial-CAPTCHA-Bench, demonstrates that humans vastly
outperform 10 state-of-the-art MLLMs, with the best model achieving only 31.0%
Pass@1 accuracy. Furthermore, we compare Spatial CAPTCHA with Google reCAPTCHA,
which confirms its effectiveness as both a security mechanism and a diagnostic
tool for spatial reasoning in AI.
Ссылки и действия
Дополнительные ресурсы: