SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
2510.00582v1
cs.CL, cs.AI, cs.SD
2025-10-04
Авторы:
Sangmin Lee, Woongjib Choi, Jihyun Kim, Hong-Goo Kang
Abstract
In this paper, we present a neural spoken language diarization model that
supports an unconstrained span of languages within a single framework. Our
approach integrates a learnable query-based architecture grounded in
multilingual awareness, with large-scale pretraining on simulated
code-switching data. By jointly leveraging these two components, our method
overcomes the limitations of conventional approaches in data scarcity and
architecture optimization, and generalizes effectively to real-world
multilingual settings across diverse environments. Experimental results
demonstrate that our approach achieves state-of-the-art performance on several
language diarization benchmarks, with a relative performance improvement of 23%
to 52% over previous methods. We believe that this work not only advances
research in language diarization but also establishes a foundational framework
for code-switching speech technologies.
Ссылки и действия
Дополнительные ресурсы: