GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies
2511.04357v1
cs.RO, cs.CV
2025-11-08
Авторы:
Maëlic Neau, Zoe Falomir, Paulo E. Santos, Anne-Gwenn Bosser, Cédric Buche
Abstract
Deploying autonomous robots that can learn new skills from demonstrations is
an important challenge of modern robotics. Existing solutions often apply
end-to-end imitation learning with Vision-Language Action (VLA) models or
symbolic approaches with Action Model Learning (AML). On the one hand, current
VLA models are limited by the lack of high-level symbolic planning, which
hinders their abilities in long-horizon tasks. On the other hand, symbolic
approaches in AML lack generalization and scalability perspectives. In this
paper we present a new neuro-symbolic approach, GraSP-VLA, a framework that
uses a Continuous Scene Graph representation to generate a symbolic
representation of human demonstrations. This representation is used to generate
new planning domains during inference and serves as an orchestrator for
low-level VLA policies, scaling up the number of actions that can be reproduced
in a row. Our results show that GraSP-VLA is effective for modeling symbolic
representations on the task of automatic planning domain generation from
observations. In addition, results on real-world experiments show the potential
of our Continuous Scene Graph representation to orchestrate low-level VLA
policies in long-horizon tasks.
Ссылки и действия
Дополнительные ресурсы: