MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark

2512.01603v1 cs.CL, cs.MM 2025-12-03
Авторы:

Yuezhang Peng, Chonghao Cai, Ziang Liu, Shuai Fan, Sheng Jiang, Hua Xu, Yuxin Liu, Qiguang Chen, Kele Xu, Yao Li, Sheng Wang, Libo Qin, Xie Chen

Abstract

Spoken Language Understanding (SLU), which aims to extract user semantics to execute downstream tasks, is a crucial component of task-oriented dialog systems. Existing SLU datasets generally lack sufficient diversity and complexity, and there is an absence of a unified benchmark for the latest Large Language Models (LLMs) and Large Audio Language Models (LALMs). This work introduces MAC-SLU, a novel Multi-Intent Automotive Cabin Spoken Language Understanding Dataset, which increases the difficulty of the SLU task by incorporating authentic and complex multi-intent data. Based on MAC-SLU, we conducted a comprehensive benchmark of leading open-source LLMs and LALMs, covering methods like in-context learning, supervised fine-tuning (SFT), and end-to-end (E2E) and pipeline paradigms. Our experiments show that while LLMs and LALMs have the potential to complete SLU tasks through in-context learning, their performance still lags significantly behind SFT. Meanwhile, E2E LALMs demonstrate performance comparable to pipeline approaches and effectively avoid error propagation from speech recognition. Code\footnote{https://github.com/Gatsby-web/MAC\_SLU} and datasets\footnote{huggingface.co/datasets/Gatsby1984/MAC\_SLU} are released publicly.

Ссылки и действия

Связанные статьи

DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Und...

## Контекст Данная работа посвящена развитию DRISHTIKON — первого в своём роде многомодального и многоязыкового бенчмарк...

2025-09-25

RealBench: A Chinese Multi-image Understanding Benchmark Close to Real-world Sce...

## Контекст В последние годы стало ясно, что понимание множества изображений (multi-image understanding) является кллюч...

2025-09-24

Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding

## Контекст Sarcasm detection является сложной задачей в области natural language understanding (NLU), так как sarcasm ч...

2025-09-23

Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents

## Контекст Современный мир охвачен всемиконный потоком мультимедийной информации. Это создает необходимость в развитии...

2025-09-18

Text2Sign Diffusion: A Generative Approach for Gloss-Free Sign Language Producti...

## Контекст Sign language production (SLP) является ключевым вопросом в области интеллектуальных технологий для продвиж...

2025-09-17