Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation
2510.07975v1
cs.RO, cs.AI
2025-10-11
Авторы:
Mingyang Sun, Jiude Wei, Qichen He, Donglin Wang, Cewu Lu, Jianhua Sun
Abstract
Enabling robots to perform precise and generalized manipulation in
unstructured environments remains a fundamental challenge in embodied AI. While
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in
semantic reasoning and task planning, a significant gap persists between their
high-level understanding and the precise physical execution required for
real-world manipulation. To bridge this "semantic-to-physical" gap, we
introduce GRACE, a novel framework that grounds VLM-based reasoning through
executable analytic concepts (EAC)-mathematically defined blueprints that
encode object affordances, geometric constraints, and semantics of
manipulation. Our approach integrates a structured policy scaffolding pipeline
that turn natural language instructions and visual information into an
instantiated EAC, from which we derive grasp poses, force directions and plan
physically feasible motion trajectory for robot execution. GRACE thus provides
a unified and interpretable interface between high-level instruction
understanding and low-level robot control, effectively enabling precise and
generalizable manipulation through semantic-physical grounding. Extensive
experiments demonstrate that GRACE achieves strong zero-shot generalization
across a variety of articulated objects in both simulated and real-world
environments, without requiring task-specific training.
Ссылки и действия
Дополнительные ресурсы: