A Neural Model for Contextual Biasing Score Learning and Filtering

2510.23849v1 eess.AS, cs.AI, cs.CL, cs.SD 2025-10-30

Авторы:

Wanting Huang, Weiran Wang

Abstract

Contextual biasing improves automatic speech recognition (ASR) by integrating external knowledge, such as user-specific phrases or entities, during decoding. In this work, we use an attention-based biasing decoder to produce scores for candidate phrases based on acoustic information extracted by an ASR encoder, which can be used to filter out unlikely phrases and to calculate bonus for shallow-fusion biasing. We introduce a per-token discriminative objective that encourages higher scores for ground-truth phrases while suppressing distractors. Experiments on the Librispeech biasing benchmark show that our method effectively filters out majority of the candidate phrases, and significantly improves recognition accuracy under different biasing conditions when the scores are used in shallow fusion biasing. Our approach is modular and can be used with any ASR system, and the filtering mechanism can potentially boost performance of other biasing methods.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

A Neural Model for Contextual Biasing Score Learning and Filtering

Авторы:

Abstract

Ссылки и действия

Связанные статьи

InstructAudio: Unified speech and music generation with natural language instruc...

MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intellig...

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

Frame-Stacked Local Transformers For Efficient Multi-Codebook Speech Generation

No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in T...

Навигация