Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

2510.06961v2 cs.CL, cs.AI, cs.SD, eess.AS 2025-10-10

Авторы:

Vaibhav Srivastav, Steven Zheng, Eric Bezzam, Eustache Le Bihan, Nithin Koluguri, Piotr Żelasko, Somshubra Majumdar, Adel Moumen, Sanchit Gandhi

Abstract

Despite rapid progress, ASR evaluation remains saturated with short-form English, and efficiency is rarely reported. We present the Open ASR Leaderboard, a fully reproducible benchmark and interactive leaderboard comparing 60+ open-source and proprietary systems across 11 datasets, including dedicated multilingual and long-form tracks. We standardize text normalization and report both word error rate (WER) and inverse real-time factor (RTFx), enabling fair accuracy-efficiency comparisons. For English transcription, Conformer encoders paired with LLM decoders achieve the best average WER but are slower, while CTC and TDT decoders deliver much better RTFx, making them attractive for long-form and offline use. Whisper-derived encoders fine-tuned for English improve accuracy but often trade off multilingual coverage. All code and dataset loaders are open-sourced to support transparent, extensible evaluation.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large an...

Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessmen...

Extending Audio Context for Long-Form Understanding in Large Audio-Language Mode...

The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models ...

FunAudio-ASR Technical Report

Навигация