MLMA: Towards Multilingual ASR With Mamba-based Architectures

2510.18684v2 cs.CL, cs.SD 2025-10-24

Авторы:

Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti

Abstract

Multilingual automatic speech recognition (ASR) remains a challenging task, especially when balancing performance across high- and low-resource languages. Recent advances in sequence modeling suggest that architectures beyond Transformers may offer better scalability and efficiency. In this work, we introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new approach that leverages the Mamba architecture -- an efficient state-space model optimized for long-context sequence processing -- for multilingual ASR. Using Mamba, MLMA implicitly incorporates language-aware conditioning and shared representations to support robust recognition across diverse languages. Experiments on standard multilingual benchmarks show that MLMA achieves competitive performance compared to Transformer-based architectures. These results highlight Mamba's potential as a strong backbone for scalable, efficient, and accurate multilingual speech recognition.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

MLMA: Towards Multilingual ASR With Mamba-based Architectures

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Dialect Identification Using Resource-Efficient Fine-Tuning Approaches

A new kid on the block: Distributional semantics predicts the word-specific tone...

CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokk...

POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Tex...

CantoASR: Prosody-Aware ASR-LALM Collaboration for Low-Resource Cantonese

Навигация