High-Fidelity Speech Enhancement via Discrete Audio Tokens

2510.02187v1 cs.SD, cs.LG, eess.AS 2025-10-04

Авторы:

Luca A. Lanzendörfer, Frédéric Berdoz, Antonis Asonitis, Roger Wattenhofer

Abstract

Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-specific speech enhancement. In this work, we introduce DAC-SE1, a simplified language model-based SE framework leveraging discrete high-resolution audio representations; DAC-SE1 preserves fine-grained acoustic details while maintaining semantic coherence. Our experiments show that DAC-SE1 surpasses state-of-the-art autoregressive SE methods on both objective perceptual metrics and in a MUSHRA human evaluation. We release our codebase and model checkpoints to support further research in scalable, unified, and high-quality speech enhancement.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

High-Fidelity Speech Enhancement via Discrete Audio Tokens

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power E...

ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music...

BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decodi...

Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music

Навигация