BanglaTalk: Towards Real-Time Speech Assistance for Bengali Regional Dialects
2510.06188v1
cs.CL, cs.AI, cs.LG
2025-10-09
Авторы:
Jakir Hasan, Shubhashis Roy Dipta
Abstract
Real-time speech assistants are becoming increasingly popular for ensuring
improved accessibility to information. Bengali, being a low-resource language
with a high regional dialectal diversity, has seen limited progress in
developing such systems. Existing systems are not optimized for real-time use
and focus only on standard Bengali. In this work, we present BanglaTalk, the
first real-time speech assistance system for Bengali regional dialects.
BanglaTalk follows the client-server architecture and uses the Real-time
Transport Protocol (RTP) to ensure low-latency communication. To address
dialectal variation, we introduce a dialect-aware ASR system, BRDialect,
developed by fine-tuning the IndicWav2Vec model in ten Bengali regional
dialects. It outperforms the baseline ASR models by 12.41-33.98% on the
RegSpeech12 dataset. Furthermore, BanglaTalk can operate at a low bandwidth of
24 kbps while maintaining an average end-to-end delay of 4.9 seconds. Low
bandwidth usage and minimal end-to-end delay make the system both
cost-effective and interactive for real-time use cases, enabling inclusive and
accessible speech technology for the diverse community of Bengali speakers.
Ссылки и действия
Дополнительные ресурсы: