How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
2511.05449v1
cs.CV, cs.LG
2025-11-11
Авторы:
Tuan Anh Tran, Duy M. H. Nguyen, Hoai-Chau Tran, Michael Barz, Khoa D. Doan, Roger Wattenhofer, Ngo Anh Vien, Mathias Niepert, Daniel Sonntag, Paul Swoboda
Abstract
Recent advances in 3D point cloud transformers have led to state-of-the-art
results in tasks such as semantic segmentation and reconstruction. However,
these models typically rely on dense token representations, incurring high
computational and memory costs during training and inference. In this work, we
present the finding that tokens are remarkably redundant, leading to
substantial inefficiency. We introduce gitmerge3D, a globally informed graph
token merging method that can reduce the token count by up to 90-95% while
maintaining competitive performance. This finding challenges the prevailing
assumption that more tokens inherently yield better performance and highlights
that many current models are over-tokenized and under-optimized for
scalability. We validate our method across multiple 3D vision tasks and show
consistent improvements in computational efficiency. This work is the first to
assess redundancy in large-scale 3D transformer models, providing insights into
the development of more efficient 3D foundation architectures. Our code and
checkpoints are publicly available at https://gitmerge3d.github.io
Ссылки и действия
Дополнительные ресурсы: