Clebsch-Gordan Transformer: Fast and Global Equivariant Attention

2509.24093v1 cs.LG, cs.CV, cs.RO 2025-10-01

Авторы:

Owen Lewis Howell, Linfeng Zhao, Xupeng Zhu, Yaoyao Qian, Haojie Huang, Lingfeng Sun, Wil Thomason, Robert Platt, Robin Walters

Abstract

The global attention mechanism is one of the keys to the success of transformer architecture, but it incurs quadratic computational costs in relation to the number of tokens. On the other hand, equivariant models, which leverage the underlying geometric structures of problem instance, often achieve superior accuracy in physical, biochemical, computer vision, and robotic tasks, at the cost of additional compute requirements. As a result, existing equivariant transformers only support low-order equivariant features and local context windows, limiting their expressiveness and performance. This work proposes Clebsch-Gordan Transformer, achieving efficient global attention by a novel Clebsch-Gordon Convolution on $\SO(3)$ irreducible representations. Our method enables equivariant modeling of features at all orders while achieving ${O}(N \log N)$ input token complexity. Additionally, the proposed method scales well with high-order irreducible features, by exploiting the sparsity of the Clebsch-Gordon matrix. Lastly, we also incorporate optional token permutation equivariance through either weight sharing or data augmentation. We benchmark our method on a diverse set of benchmarks including n-body simulation, QM9, ModelNet point cloud classification and a robotic grasping dataset, showing clear gains over existing equivariant transformers in GPU memory size, speed, and accuracy.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Clebsch-Gordan Transformer: Fast and Global Equivariant Attention

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Learning Massively Multitask World Models for Continuous Control

AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention

CorVS: Person Identification via Video Trajectory-Sensor Correspondence in a Rea...

Навигация