Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
2510.03104v1
cs.CV, cs.RO
2025-10-07
Авторы:
Zhiting Mei, Ola Shorinwa, Anirudha Majumdar
Abstract
Semantic distillation in radiance fields has spurred significant advances in
open-vocabulary robot policies, e.g., in manipulation and navigation, founded
on pretrained semantics from large vision models. While prior work has
demonstrated the effectiveness of visual-only semantic features (e.g., DINO and
CLIP) in Gaussian Splatting and neural radiance fields, the potential benefit
of geometry-grounding in distilled fields remains an open question. In
principle, visual-geometry features seem very promising for spatial tasks such
as pose estimation, prompting the question: Do geometry-grounded semantic
features offer an edge in distilled fields? Specifically, we ask three critical
questions: First, does spatial-grounding produce higher-fidelity geometry-aware
semantic features? We find that image features from geometry-grounded backbones
contain finer structural details compared to their counterparts. Secondly, does
geometry-grounding improve semantic object localization? We observe no
significant difference in this task. Thirdly, does geometry-grounding enable
higher-accuracy radiance field inversion? Given the limitations of prior work
and their lack of semantics integration, we propose a novel framework SPINE for
inverting radiance fields without an initial guess, consisting of two core
components: coarse inversion using distilled semantics, and fine inversion
using photometric-based optimization. Surprisingly, we find that the pose
estimation accuracy decreases with geometry-grounded features. Our results
suggest that visual-only features offer greater versatility for a broader range
of downstream tasks, although geometry-grounded features contain more geometric
detail. Notably, our findings underscore the necessity of future research on
effective strategies for geometry-grounding that augment the versatility and
performance of pretrained semantic features.
Ссылки и действия
Дополнительные ресурсы: