Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection
2510.19331v1
cs.CL, cs.CY
2025-10-24
Авторы:
Ewelina Gajewska, Arda Derbent, Jaroslaw A Chudziak, Katarzyna Budzynska
Abstract
In this paper, we investigate how personalising Large Language Models
(Persona-LLMs) with annotator personas affects their sensitivity to hate
speech, particularly regarding biases linked to shared or differing identities
between annotators and targets. To this end, we employ Google's Gemini and
OpenAI's GPT-4.1-mini models and two persona-prompting methods: shallow persona
prompting and a deeply contextualised persona development based on
Retrieval-Augmented Generation (RAG) to incorporate richer persona profiles. We
analyse the impact of using in-group and out-group annotator personas on the
models' detection performance and fairness across diverse social groups. This
work bridges psychological insights on group identity with advanced NLP
techniques, demonstrating that incorporating socio-demographic attributes into
LLMs can address bias in automated hate speech detection. Our results highlight
both the potential and limitations of persona-based approaches in reducing
bias, offering valuable insights for developing more equitable hate speech
detection systems.
Ссылки и действия
Дополнительные ресурсы: