Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
2510.07517v1
cs.AI, cs.MA
2025-10-11
Авторы:
Hyeong Kyu Choi, Xiaojin Zhu, Yixuan Li
Abstract
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning
by letting multiple agents exchange answers and then aggregate their opinions.
Yet recent studies reveal that agents are not neutral: they are prone to
identity-driven sycophancy and self-bias, uncritically adopting a peer's view
or stubbornly adhering to their own prior output, undermining the reliability
of debate. In this work, we present the first principled framework that joins
sycophancy and self-bias to mitigate and quantify identity bias in MAD. First,
we formalize the debate dynamics as an identity-weighted Bayesian update
process. Second, we propose response anonymization: by removing identity
markers from prompts, agents cannot distinguish "self" from "peer", which
forces equal weights on agent identity, thereby reducing bias. Third, we define
the Identity Bias Coefficient (IBC), a principled metric that measures how
often an agent follows a peer versus itself. Empirical studies across multiple
models, datasets and debate rounds confirm that identity bias is widespread,
with sycophancy far more common than self-bias. Our findings highlight the need
to "mask" identity to ensure that MAD systems reason based on content rather
than source identity. Code is released in
https://github.com/deeplearning-wisc/MAD-identity-bias.
Ссылки и действия
Дополнительные ресурсы: