A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics

2510.27033v1 cs.RO, cs.AI, cs.CV 2025-11-04

Авторы:

Simindokht Jahangard, Mehrzad Mohammadi, Abhinav Dhall, Hamid Rezatofighi

Abstract

Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language models (VLMs) excel at perception tasks but struggle with fine-grained spatial reasoning due to their implicit, correlation-driven reasoning and reliance solely on images. We propose a novel neuro_symbolic framework that integrates both panoramic-image and 3D point cloud information, combining neural perception with symbolic reasoning to explicitly model spatial and logical relationships. Our framework consists of a perception module for detecting entities and extracting attributes, and a reasoning module that constructs a structured scene graph to support precise, interpretable queries. Evaluated on the JRDB-Reasoning dataset, our approach demonstrates superior performance and reliability in crowded, human_built environments while maintaining a lightweight design suitable for robotics and embodied AI applications.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

Obstruction reasoning for robotic grasping

RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as A...

SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied ...

Stable Multi-Drone GNSS Tracking System for Marine Robots

Навигация