Open Source State-Of-the-Art Solution for Romanian Speech Recognition
2511.03361v1
eess.AS, cs.AI
2025-11-07
Авторы:
Gabriel Pirlogeanu, Alexandru-Lucian Georgescu, Horia Cucu
Abstract
In this work, we present a new state-of-the-art Romanian Automatic Speech
Recognition (ASR) system based on NVIDIA's FastConformer architecture--explored
here for the first time in the context of Romanian. We train our model on a
large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600
hours of speech. Leveraging a hybrid decoder with both Connectionist Temporal
Classification (CTC) and Token-Duration Transducer (TDT) branches, we evaluate
a range of decoding strategies including greedy, ALSD, and CTC beam search with
a 6-gram token-level language model. Our system achieves state-of-the-art
performance across all Romanian evaluation benchmarks, including read,
spontaneous, and domain-specific speech, with up to 27% relative WER reduction
compared to previous best-performing systems. In addition to improved
transcription accuracy, our approach demonstrates practical decoding
efficiency, making it suitable for both research and deployment in low-latency
ASR applications.
Ссылки и действия
Дополнительные ресурсы: