GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies

2511.04357v1 cs.RO, cs.CV 2025-11-08

Авторы:

Maëlic Neau, Zoe Falomir, Paulo E. Santos, Anne-Gwenn Bosser, Cédric Buche

Abstract

Deploying autonomous robots that can learn new skills from demonstrations is an important challenge of modern robotics. Existing solutions often apply end-to-end imitation learning with Vision-Language Action (VLA) models or symbolic approaches with Action Model Learning (AML). On the one hand, current VLA models are limited by the lack of high-level symbolic planning, which hinders their abilities in long-horizon tasks. On the other hand, symbolic approaches in AML lack generalization and scalability perspectives. In this paper we present a new neuro-symbolic approach, GraSP-VLA, a framework that uses a Continuous Scene Graph representation to generate a symbolic representation of human demonstrations. This representation is used to generate new planning domains during inference and serves as an orchestrator for low-level VLA policies, scaling up the number of actions that can be reproduced in a row. Our results show that GraSP-VLA is effective for modeling symbolic representations on the task of automatic planning domain generation from observations. In addition, results on real-world experiments show the potential of our Continuous Scene Graph representation to orchestrate low-level VLA policies in long-horizon tasks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies

Авторы:

Abstract

Ссылки и действия

Связанные статьи

From Generated Human Videos to Physically Plausible Robot Trajectories

Sign Language Recognition using Bidirectional Reservoir Computing

FOM-Nav: Frontier-Object Maps for Object Goal Navigation

Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Estimation of Kinematic Motion from Dashcam Footage

Навигация