FastPOS: Language-Agnostic Scalable POS Tagging Framework Low-Resource Use Case
2512.00745v1
cs.CL
2025-12-04
Авторы:
Md Abdullah Al Kafi, Sumit Kumar Banshal
Abstract
This study proposes a language-agnostic transformer-based POS tagging framework designed for low-resource languages, using Bangla and Hindi as case studies. With only three lines of framework-specific code, the model was adapted from Bangla to Hindi, demonstrating effective portability with minimal modification. The framework achieves 96.85 percent and 97 percent token-level accuracy across POS categories in Bangla and Hindi while sustaining strong F1 scores despite dataset imbalance and linguistic overlap. A performance discrepancy in a specific POS category underscores ongoing challenges in dataset curation. The strong results stem from the underlying transformer architecture, which can be replaced with limited code adjustments. Its modular and open-source design enables rapid cross-lingual adaptation while reducing model design and tuning overhead, allowing researchers to focus on linguistic preprocessing and dataset refinement, which are essential for advancing NLP in underrepresented languages.
Ссылки и действия
Дополнительные ресурсы: