Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
2510.18898v1
cs.CL, cs.CY
2025-10-24
Авторы:
Mangsura Kabir Oni, Tabia Tanzin Prama
Abstract
Machine Translation (MT) has advanced from rule-based and statistical methods
to neural approaches based on the Transformer architecture. While these methods
have achieved impressive results for high-resource languages, low-resource
varieties such as Sylheti remain underexplored. In this work, we investigate
Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models
and comparing them with zero-shot large language models (LLMs). Experimental
results demonstrate that fine-tuned models significantly outperform LLMs, with
mBART-50 achieving the highest translation adequacy and MarianMT showing the
strongest character-level fidelity. These findings highlight the importance of
task-specific adaptation for underrepresented languages and contribute to
ongoing efforts toward inclusive language technologies.
Ссылки и действия
Дополнительные ресурсы: