RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
2509.25897v1
cs.CL, cs.AI, cs.CY
2025-10-02
Авторы:
Jisu Shin, Hoyun Song, Juhyun Oh, Changgeon Ko, Eunsu Kim, Chani Jung, Alice Oh
Abstract
Humans often encounter role conflicts -- social dilemmas where the
expectations of multiple roles clash and cannot be simultaneously fulfilled. As
large language models (LLMs) become increasingly influential in human
decision-making, understanding how they behave in complex social situations is
essential. While previous research has evaluated LLMs' social abilities in
contexts with predefined correct answers, role conflicts represent inherently
ambiguous social dilemmas that require contextual sensitivity: the ability to
recognize and appropriately weigh situational cues that can fundamentally alter
decision priorities. To address this gap, we introduce RoleConflictBench, a
novel benchmark designed to evaluate LLMs' contextual sensitivity in complex
social dilemmas. Our benchmark employs a three-stage pipeline to generate over
13K realistic role conflict scenarios across 65 roles, systematically varying
their associated expectations (i.e., their responsibilities and obligations)
and situational urgency levels. By analyzing model choices across 10 different
LLMs, we find that while LLMs show some capacity to respond to these contextual
cues, this sensitivity is insufficient. Instead, their decisions are
predominantly governed by a powerful, inherent bias related to social roles
rather than situational information. Our analysis quantifies these biases,
revealing a dominant preference for roles within the Family and Occupation
domains, as well as a clear prioritization of male roles and Abrahamic
religions across most evaluatee models.
Ссылки и действия
Дополнительные ресурсы: