Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
2509.25035v2
cs.CL, cs.AI, cs.LG
2025-10-02
Авторы:
Haoyang Zheng, Xinyang Liu, Cindy Xiangrui Kong, Nan Jiang, Zheyuan Hu, Weijian Luo, Wei Deng, Guang Lin
Abstract
Fast and high-quality language generation is the holy grail that people
pursue in the age of AI. In this work, we introduce Discrete Diffusion
Divergence Instruct (DiDi-Instruct), a training-based method that initializes
from a pre-trained (masked) discrete diffusion language model (dLLM) and
distills a few-step student for fast generation. The resulting DiDi-Instruct
model achieves comparable or superior performance to its dLLM teacher and the
GPT-2 baseline while enabling up to 64$\times$ acceleration. The theoretical
foundation of DiDi-Instruct is a novel framework based on integral
KL-divergence minimization, which yields a practical training algorithm. We
further introduce grouped reward normalization, intermediate-state matching,
and the reward-guided ancestral sampler that significantly improve training
stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct
achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms
prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible
entropy loss (around $1\%$) and reduce additional training wall-clock time by
more than $20\times$ compared to competing dLLM distillation methods. We
further validate the robustness and effectiveness of DiDi-Instruct through
extensive ablation studies, model scaling, and the generation of discrete
protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective
distillation method, enabling language generation in the blink of an eye. We
will release both code and models at github.com/haoyangzheng-ai/didi-instruct.
Ссылки и действия
Дополнительные ресурсы: