SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

2510.00582v1 cs.CL, cs.AI, cs.SD 2025-10-04

Авторы:

Sangmin Lee, Woongjib Choi, Jihyun Kim, Hong-Goo Kang

Abstract

In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual awareness, with large-scale pretraining on simulated code-switching data. By jointly leveraging these two components, our method overcomes the limitations of conventional approaches in data scarcity and architecture optimization, and generalizes effectively to real-world multilingual settings across diverse environments. Experimental results demonstrate that our approach achieves state-of-the-art performance on several language diarization benchmarks, with a relative performance improvement of 23% to 52% over previous methods. We believe that this work not only advances research in language diarization but also establishes a foundational framework for code-switching speech technologies.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

Авторы:

Abstract

Ссылки и действия

Связанные статьи

VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi...

A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recita...

Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-t...

Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Gui...

SloPalSpeech: A 2,8000-Hour Slovak Speech Corpus from Parliamentary Data

Навигация