Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects
2511.05356v1
cs.CV, I.2.10; I.4.6; I.5.1; I.5.4
2025-11-11
Авторы:
Manuel Gomes, Bogdan Raducanu, Miguel Oliveira
Abstract
Articulated object perception presents significant challenges in computer
vision, particularly because most existing methods ignore temporal dynamics
despite the inherently dynamic nature of such objects. The use of 4D temporal
data has not been thoroughly explored in articulated object perception and
remains unexamined for panoptic segmentation. The lack of a benchmark dataset
further hurt this field. To this end, we introduce Artic4D as a new dataset
derived from PartNet Mobility and augmented with synthetic sensor data,
featuring 4D panoptic annotations and articulation parameters. Building on this
dataset, we propose CanonSeg4D, a novel 4D panoptic segmentation framework.
This approach explicitly estimates per-frame offsets mapping observed object
parts to a learned canonical space, thereby enhancing part-level segmentation.
The framework employs this canonical representation to achieve consistent
alignment of object parts across sequential frames. Comprehensive experiments
on Artic4D demonstrate that the proposed CanonSeg4D outperforms state of the
art approaches in panoptic segmentation accuracy in more complex scenarios.
These findings highlight the effectiveness of temporal modeling and canonical
alignment in dynamic object understanding, and pave the way for future advances
in 4D articulated object perception.