UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene

2510.06754v1 cs.RO, cs.CV, cs.LG 2025-10-10

Авторы:

Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki

Abstract

Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene

Авторы:

Abstract

Ссылки и действия

Связанные статьи

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodimen...

Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splat...

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots fr...

Compressor-VLA: Instruction-Guided Visual Token Compression for Efficient Roboti...

Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-B...

Навигация