SCOPE: Semantic Conditioning for Sim2Real Category-Level Object Pose Estimation in Robotics
2509.24572v1
cs.CV, cs.RO
2025-10-01
Авторы:
Peter Hönig, Stefan Thalhammer, Jean-Baptiste Weibel, Matthias Hirschmanner, Markus Vincze
Abstract
Object manipulation requires accurate object pose estimation. In open
environments, robots encounter unknown objects, which requires semantic
understanding in order to generalize both to known categories and beyond. To
resolve this challenge, we present SCOPE, a diffusion-based category-level
object pose estimation model that eliminates the need for discrete category
labels by leveraging DINOv2 features as continuous semantic priors. By
combining these DINOv2 features with photorealistic training data and a noise
model for point normals, we reduce the Sim2Real gap in category-level object
pose estimation. Furthermore, injecting the continuous semantic priors via
cross-attention enables SCOPE to learn canonicalized object coordinate systems
across object instances beyond the distribution of known categories. SCOPE
outperforms the current state of the art in synthetically trained
category-level object pose estimation, achieving a relative improvement of
31.9\% on the 5$^\circ$5cm metric. Additional experiments on two instance-level
datasets demonstrate generalization beyond known object categories, enabling
grasping of unseen objects from unknown categories with a success rate of up to
100\%. Code available: https://github.com/hoenigpeter/scope.
Ссылки и действия
Дополнительные ресурсы: