Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

2510.01164v1 cs.CL, cs.AI, cs.CY, cs.HC 2025-10-04

Авторы:

Zhengliang Shi, Ruotian Ma, Jen-tse Huang, Xinbei Ma, Xingyu Chen, Mengru Wang, Qu Yang, Yue Wang, Fanghua Ye, Ziyang Chen, Shanyi Wang, Cixing Li, Wenxuan Wang, Zhaopeng Tu, Xiaolong Li, Zhaochun Ren, Linus

Abstract

Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that guide these models when distributing scarce societal resources remain largely unexamined. To address this, we introduce the Social Welfare Function (SWF) Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator, distributing tasks to a heterogeneous community of recipients. The benchmark is designed to create a persistent trade-off between maximizing collective efficiency (measured by Return on Investment) and ensuring distributive fairness (measured by the Gini coefficient). We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation. Our findings reveal three key insights: (i) A model's general conversational ability, as measured by popular leaderboards, is a poor predictor of its allocation skill. (ii) Most LLMs exhibit a strong default utilitarian orientation, prioritizing group productivity at the expense of severe inequality. (iii) Allocation strategies are highly vulnerable, easily perturbed by output-length constraints and social-influence framing. These results highlight the risks of deploying current LLMs as societal decision-makers and underscore the need for specialized benchmarks and targeted alignment for AI governance.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Social Welfare Function Leaderboard: When LLM Agents Allocate Social Welfare

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Fee...

Who's Asking? Simulating Role-Based Questions for Conversational AI Evaluation

From Binary to Bilingual: How the National Weather Service is Using Artificial I...

A perishable ability? The future of writing in the face of generative artificial...

Навигация