UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
2510.20441v1
cs.SD, cs.AI
2025-10-25
Авторы:
Haoyin Yan, Chengwei Liu, Shaofei Xue, Xiaotao Liang, Zheng Xue
Abstract
The development of neural audio codecs (NACs) has largely promoted
applications of language models (LMs) to speech processing and understanding.
However, there lacks the verification on the effectiveness of autoregressive
(AR) LMbased models in unifying different sub-tasks of speech enhancement (SE).
In this work, we propose UniSE, a unified decoder-only LM-based framework to
handle different SE tasks including speech restoration, target speaker
extraction and speech separation. It takes input speech features as conditions
and generates discrete tokens of the target speech using AR modeling, which
facilitates a compatibility between distinct learning patterns of multiple
tasks. Experiments on several benchmarks indicate the proposed UniSE can
achieve competitive performance compared to discriminative and generative
baselines, showing the capacity of LMs in unifying SE tasks. The demo page is
available here: https://github.com/hyyan2k/UniSE.
Ссылки и действия
Дополнительные ресурсы: