On Structured State-Space Duality
2510.04944v1
cs.LG, cs.CL, cs.CV, stat.ML
2025-10-08
Авторы:
Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu
Abstract
Structured State-Space Duality (SSD) [Dao & Gu, ICML 2024] is an equivalence
between a simple Structured State-Space Model (SSM) and a masked attention
mechanism. In particular, a state-space model with a scalar-times-identity
state matrix is equivalent to a masked self-attention with a $1$-semiseparable
causal mask. Consequently, the same sequence transformation (model) has two
algorithmic realizations: as a linear-time $O(T)$ recurrence or as a
quadratic-time $O(T^2)$ attention. In this note, we formalize and generalize
this duality: (i) we extend SSD from the scalar-identity case to general
diagonal SSMs (diagonal state matrices); (ii) we show that these diagonal SSMs
match the scalar case's training complexity lower bounds while supporting
richer dynamics; (iii) we establish a necessary and sufficient condition under
which an SSM is equivalent to $1$-semiseparable masked attention; and (iv) we
show that such duality fails to extend to standard softmax attention due to
rank explosion. Together, these results tighten bridge between recurrent SSMs
and Transformers, and widen the design space for expressive yet efficient
sequence models.