PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
2511.03080v1
cs.CL, cs.LG
2025-11-07
Авторы:
Michel Wong, Ali Alshehri, Sophia Kao, Haotian He
Abstract
Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS)
systems, converting written forms into their canonical spoken equivalents.
Traditional TN systems can exhibit high accuracy, but involve substantial
engineering effort, are difficult to scale, and pose challenges to language
coverage, particularly in low-resource settings. We propose PolyNorm, a
prompt-based approach to TN using Large Language Models (LLMs), aiming to
reduce the reliance on manually crafted rules and enable broader linguistic
applicability with minimal human intervention. Additionally, we present a
language-agnostic pipeline for automatic data curation and evaluation, designed
to facilitate scalable experimentation across diverse languages. Experiments
across eight languages show consistent reductions in the word error rate (WER)
compared to a production-grade-based system. To support further research, we
release PolyNorm-Benchmark, a multilingual data set covering a diverse range of
text normalization phenomena.
Ссылки и действия
Дополнительные ресурсы: