Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
2510.06961v2
cs.CL, cs.AI, cs.SD, eess.AS
2025-10-10
Авторы:
Vaibhav Srivastav, Steven Zheng, Eric Bezzam, Eustache Le Bihan, Nithin Koluguri, Piotr Żelasko, Somshubra Majumdar, Adel Moumen, Sanchit Gandhi
Abstract
Despite rapid progress, ASR evaluation remains saturated with short-form
English, and efficiency is rarely reported. We present the Open ASR
Leaderboard, a fully reproducible benchmark and interactive leaderboard
comparing 60+ open-source and proprietary systems across 11 datasets, including
dedicated multilingual and long-form tracks. We standardize text normalization
and report both word error rate (WER) and inverse real-time factor (RTFx),
enabling fair accuracy-efficiency comparisons. For English transcription,
Conformer encoders paired with LLM decoders achieve the best average WER but
are slower, while CTC and TDT decoders deliver much better RTFx, making them
attractive for long-form and offline use. Whisper-derived encoders fine-tuned
for English improve accuracy but often trade off multilingual coverage. All
code and dataset loaders are open-sourced to support transparent, extensible
evaluation.