When Empowerment Disempowers
2511.04177v1
cs.AI, cs.MA
2025-11-08
Авторы:
Claire Yang, Maya Cakmak, Max Kleiman-Weiner
Abstract
Empowerment, a measure of an agent's ability to control its environment, has
been proposed as a universal goal-agnostic objective for motivating assistive
behavior in AI agents. While multi-human settings like homes and hospitals are
promising for AI assistance, prior work on empowerment-based assistance assumes
that the agent assists one human in isolation. We introduce an open source
multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we
empirically show that assistive RL agents optimizing for one human's
empowerment can significantly reduce another human's environmental influence
and rewards - a phenomenon we formalize as disempowerment. We characterize when
disempowerment occurs in these environments and show that joint empowerment
mitigates disempowerment at the cost of the user's reward. Our work reveals a
broader challenge for the AI alignment community: goal-agnostic objectives that
seem aligned in single-agent settings can become misaligned in multi-agent
contexts.
Ссылки и действия
Дополнительные ресурсы: