Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning
2510.02268v1
cs.RO, cs.CV
2025-10-04
Авторы:
Tianchong Jiang, Jingtian Ji, Xiangshan Tan, Jiading Fang, Anand Bhattad, Vitor Guizilini, Matthew R. Walter
Abstract
We study view-invariant imitation learning by explicitly conditioning
policies on camera extrinsics. Using Plucker embeddings of per-pixel rays, we
show that conditioning on extrinsics significantly improves generalization
across viewpoints for standard behavior cloning policies, including ACT,
Diffusion Policy, and SmolVLA. To evaluate policy robustness under realistic
viewpoint shifts, we introduce six manipulation tasks in RoboSuite and
ManiSkill that pair "fixed" and "randomized" scene variants, decoupling
background cues from camera pose. Our analysis reveals that policies without
extrinsics often infer camera pose using visual cues from static backgrounds in
fixed scenes; this shortcut collapses when workspace geometry or camera
placement shifts. Conditioning on extrinsics restores performance and yields
robust RGB-only control without depth. We release the tasks, demonstrations,
and code at https://ripl.github.io/know_your_camera/ .
Ссылки и действия
Дополнительные ресурсы: