Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment

2510.16387v1 cs.CL, cs.AI, cs.SD, eess.AS 2025-10-22

Авторы:

Fu-An Chao, Bi-Cheng Yan, Berlin Chen

Abstract

In this paper, we explore the untapped potential of Whisper, a well-established automatic speech recognition (ASR) foundation model, in the context of L2 spoken language assessment (SLA). Unlike prior studies that extrinsically analyze transcriptions produced by Whisper, our approach goes a step further to probe its latent capabilities by extracting acoustic and linguistic features from hidden representations. With only a lightweight classifier being trained on top of Whisper's intermediate and final outputs, our method achieves strong performance on the GEPT picture-description dataset, outperforming existing cutting-edge baselines, including a multimodal approach. Furthermore, by incorporating image and text-prompt information as auxiliary relevance cues, we demonstrate additional performance gains. Finally, we conduct an in-depth analysis of Whisper's embeddings, which reveals that, even without task-specific fine-tuning, the model intrinsically encodes both ordinal proficiency patterns and semantic aspects of speech, highlighting its potential as a powerful foundation for SLA and other spoken language understanding tasks.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large an...

Extending Audio Context for Long-Form Understanding in Large Audio-Language Mode...

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long...

The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models ...

FunAudio-ASR Technical Report

Навигация