Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras
2510.26614v1
cs.CV, cs.RO
2025-11-01
Авторы:
Christoffer Koo Øhrstrøm, Ronja Güldenring, Lazaros Nalpantidis
Abstract
We propose tokenization of events and present a tokenizer, Spiking Patches,
specifically designed for event cameras. Given a stream of asynchronous and
spatially sparse events, our goal is to discover an event representation that
preserves these properties. Prior works have represented events as frames or as
voxels. However, while these representations yield high accuracy, both frames
and voxels are synchronous and decrease the spatial sparsity. Spiking Patches
gives the means to preserve the unique properties of event cameras and we show
in our experiments that this comes without sacrificing accuracy. We evaluate
our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and
object detection. Tokens from Spiking Patches yield inference times that are up
to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We
achieve this while matching their accuracy and even surpassing in some cases
with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for
object detection. Thus, tokenization constitutes a novel direction in
event-based vision and marks a step towards methods that preserve the
properties of event cameras.
Ссылки и действия
Дополнительные ресурсы: