Parameter-Efficient Fine-Tuning for Low-Resource Languages: A Comparative Study of LLMs for Bengali Hate Speech Detection
2510.16985v1
cs.CL, cs.AI, cs.LG
2025-10-22
Авторы:
Akif Islam, Mohd Ruhul Ameen
Abstract
Bengali social media platforms have witnessed a sharp increase in hate
speech, disproportionately affecting women and adolescents. While datasets such
as BD-SHS provide a basis for structured evaluation, most prior approaches rely
on either computationally costly full-model fine-tuning or proprietary APIs.
This paper presents the first application of Parameter-Efficient Fine-Tuning
(PEFT) for Bengali hate speech detection using LoRA and QLoRA. Three
instruction-tuned large language models - Gemma-3-4B, Llama-3.2-3B, and
Mistral-7B - were fine-tuned on the BD-SHS dataset of 50,281 annotated
comments. Each model was adapted by training fewer than 1% of its parameters,
enabling experiments on a single consumer-grade GPU. The results show that
Llama-3.2-3B achieved the highest F1-score of 92.23%, followed by Mistral-7B at
88.94% and Gemma-3-4B at 80.25%. These findings establish PEFT as a practical
and replicable strategy for Bengali and related low-resource languages.
Ссылки и действия
Дополнительные ресурсы: