Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection

2510.19331v1 cs.CL, cs.CY 2025-10-24

Авторы:

Ewelina Gajewska, Arda Derbent, Jaroslaw A Chudziak, Katarzyna Budzynska

Abstract

In this paper, we investigate how personalising Large Language Models (Persona-LLMs) with annotator personas affects their sensitivity to hate speech, particularly regarding biases linked to shared or differing identities between annotators and targets. To this end, we employ Google's Gemini and OpenAI's GPT-4.1-mini models and two persona-prompting methods: shallow persona prompting and a deeply contextualised persona development based on Retrieval-Augmented Generation (RAG) to incorporate richer persona profiles. We analyse the impact of using in-group and out-group annotator personas on the models' detection performance and fairness across diverse social groups. This work bridges psychological insights on group identity with advanced NLP techniques, demonstrating that incorporating socio-demographic attributes into LLMs can address bias in automated hate speech detection. Our results highlight both the potential and limitations of persona-based approaches in reducing bias, offering valuable insights for developing more equitable hate speech detection systems.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Identifying attributions of causality in political text

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Ques...

Gender Bias in Emotion Recognition by Large Language Models

Analysing Personal Attacks in U.S. Presidential Debates

Навигация