MLMA: Towards Multilingual ASR With Mamba-based Architectures
2510.18684v2
cs.CL, cs.SD
2025-10-24
Авторы:
Mohamed Nabih Ali, Daniele Falavigna, Alessio Brutti
Abstract
Multilingual automatic speech recognition (ASR) remains a challenging task,
especially when balancing performance across high- and low-resource languages.
Recent advances in sequence modeling suggest that architectures beyond
Transformers may offer better scalability and efficiency. In this work, we
introduce MLMA (Multilingual Language Modeling with Mamba for ASR), a new
approach that leverages the Mamba architecture -- an efficient state-space
model optimized for long-context sequence processing -- for multilingual ASR.
Using Mamba, MLMA implicitly incorporates language-aware conditioning and
shared representations to support robust recognition across diverse languages.
Experiments on standard multilingual benchmarks show that MLMA achieves
competitive performance compared to Transformer-based architectures. These
results highlight Mamba's potential as a strong backbone for scalable,
efficient, and accurate multilingual speech recognition.
Ссылки и действия
Дополнительные ресурсы: