On the Reasoning Abilities of Masked Diffusion Language Models
2510.13117v1
cs.LG, cs.AI, cs.CL
2025-10-17
Авторы:
Anej Svete, Ashish Sabharwal
Abstract
Masked diffusion models (MDMs) for text offer a compelling alternative to
traditional autoregressive language models. Parallel generation makes them
efficient, but their computational capabilities and the limitations inherent to
their parallelism remain largely unexplored. To this end, we characterize what
types of reasoning problems MDMs can provably solve and how efficiently. We do
this by connecting MDMs to the well-understood reasoning frameworks of chain of
thought (CoT) and padded looped transformers (PLTs) in the finite-precision
log-width setting: We show that MDMs and polynomially-padded PLTs are, in fact,
equivalent in this setting, and that MDMs can solve all problems that
CoT-augmented transformers can. Moreover, we showcase classes of problems
(including regular languages) for which MDMs are inherently more efficient than
CoT transformers, where parallel generation allows for substantially faster
reasoning.
Ссылки и действия
Дополнительные ресурсы: