Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs
2510.27558v1
cs.RO, cs.AI, cs.LG
2025-11-04
Авторы:
Sushil Samuel Dinesh, Shinkyu Park
Abstract
This paper presents a framework that leverages pre-trained foundation models
for robotic manipulation without domain-specific training. The framework
integrates off-the-shelf models, combining multimodal perception from
foundation models with a general-purpose reasoning model capable of robust task
sequencing. Scene graphs, dynamically maintained within the framework, provide
spatial awareness and enable consistent reasoning about the environment. The
framework is evaluated through a series of tabletop robotic manipulation
experiments, and the results highlight its potential for building robotic
manipulation systems directly on top of off-the-shelf foundation models.
Ссылки и действия
Дополнительные ресурсы: