Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs

2510.27558v1 cs.RO, cs.AI, cs.LG 2025-11-04

Авторы:

Sushil Samuel Dinesh, Shinkyu Park

Abstract

This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combining multimodal perception from foundation models with a general-purpose reasoning model capable of robust task sequencing. Scene graphs, dynamically maintained within the framework, provide spatial awareness and enable consistent reasoning about the environment. The framework is evaluated through a series of tabletop robotic manipulation experiments, and the results highlight its potential for building robotic manipulation systems directly on top of off-the-shelf foundation models.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphi...

Real-World Reinforcement Learning of Active Perception Behaviors

Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical...

Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot ...

Навигация