SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling
2510.15851v1
cs.CL, cs.LG
2025-10-21
Авторы:
Kadri Hacioglu, Manjunath K E, Andreas Stolcke
Abstract
Slot filling is a crucial subtask in spoken language understanding (SLU),
traditionally implemented as a cascade of speech recognition followed by one or
more natural language understanding (NLU) components. The recent advent of
speech-based large language models (speechLLMs), which integrate speech and
textual foundation models, has opened new avenues for achieving speech
understanding tasks in a more unified, generative, and instruction-following
manner while promising data and compute efficiency with zero-shot abilities,
generalizing to unseen slot labels. We address the slot-filling task by
creating an empirical upper bound for the task, identifying performance,
robustness, and generalization gaps, and proposing improvements to the training
data, architecture, and training strategies to narrow the gap with the upper
bound result. We show that each of these measures improve performance
substantially, while highlighting practical challenges and providing empirical
guidance and insights for harnessing these emerging models.
Ссылки и действия
Дополнительные ресурсы: