What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems

2508.04402v1 cs.CL 2025-08-09

Авторы:

Kiyotada Mori, Seiya Kawano, Chaoran Liu, Carlos Toshinori Ishi, Angel Fernando Garcia Contreras, Koichiro Yoshino

Резюме на русском

Современные разговорные системы (SDS) используют ASR для распознавания пользовательского говорения и формирования ответов. Однако ASR часто сталкивается с проблемами восприятия пользовательских речи в сложных диалогах. Эта проблема основана на разнице между значимыми и незначимыми частями речи, которая ключева для адекватного понимания и реакции. В данном исследовании экспериментально подтверждено, что люди при формировании ответов диалога ориентируются на значимые части речи, что позволяет идентифицировать важные аспекты для ASR. Основываясь на этих результатах, авторы предлагают новую методологию оценки ASR, которая будет строиться на основе знаний о человеческом способе слушания значимых частей речи. Такой подход может эффективно уточнить характерные недостатки ASR во взаимодействии с SDS.

Abstract

Spoken dialogue systems (SDSs) utilize automatic speech recognition (ASR) at the front end of their pipeline. The role of ASR in SDSs is to recognize information in user speech related to response generation appropriately. Examining selective listening of humans, which refers to the ability to focus on and listen to important parts of a conversation during the speech, will enable us to identify the ASR capabilities required for SDSs and evaluate them. In this study, we experimentally confirmed selective listening when humans generate dialogue responses by comparing human transcriptions for generating dialogue responses and reference transcriptions. Based on our experimental results, we discuss the possibility of a new ASR evaluation method that leverages human selective listening, which can identify the gap between transcription ability between ASR systems and humans.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Nexus: Higher-Order Attention Mechanisms in Transformers

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

SQuARE: Structured Query & Adaptive Retrieval Engine For Tabular Formats

RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Mod...

Навигация