Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs
2511.01187v1
cs.CL, cs.CY
2025-11-06
Авторы:
Muhammed Saeed, Muhammad Abdul-mageed, Shady Shehata
Abstract
Large language models (LLMs) are widely deployed for open-ended
communication, yet most bias evaluations still rely on English,
classification-style tasks. We introduce DebateBias-8K, a new multilingual,
debate-style benchmark designed to reveal how narrative bias appears in
realistic generative settings. Our dataset includes 8,400 structured debate
prompts spanning four sensitive domains: women's rights, socioeconomic
development, terrorism, and religion, across seven languages ranging from
high-resource (English, Chinese) to low-resource (Swahili, Nigerian Pidgin).
Using four flagship models (GPT-4o, Claude 3, DeepSeek, and LLaMA 3), we
generate and automatically classify over 100,000 responses. Results show that
all models reproduce entrenched stereotypes despite safety alignment: Arabs are
overwhelmingly linked to terrorism and religion (>=95%), Africans to
socioeconomic "backwardness" (up to <=77%), and Western groups are consistently
framed as modern or progressive. Biases grow sharply in lower-resource
languages, revealing that alignment trained primarily in English does not
generalize globally. Our findings highlight a persistent divide in multilingual
fairness: current alignment methods reduce explicit toxicity but fail to
prevent biased outputs in open-ended contexts. We release our DebateBias-8K
benchmark and analysis framework to support the next generation of multilingual
bias evaluation and safer, culturally inclusive model alignment.
Ссылки и действия
Дополнительные ресурсы: