Training Proactive and Personalized LLM Agents
2511.02208v1
cs.AI, cs.CL, cs.LG
2025-11-06
Авторы:
Weiwei Sun, Xuhui Zhou, Weihua Du, Xingyao Wang, Sean Welleck, Graham Neubig, Maarten Sap, Yiming Yang
Abstract
While existing work focuses primarily on task success, we argue that
effective real-world agents require optimizing three dimensions: productivity
(task completion), proactivity (asking essential questions), and
personalization (adapting to diverse user preferences). We introduce UserVille,
an interactive environment with LLM-based user simulators enabling diverse,
configurable user preferences. Leveraging UserVille, we introduce PPP, a
multi-objective reinforcement learning approach that jointly optimizes all
three dimensions: Productivity, Proactivity, and Personalization. Experiments
on software engineering and deep research tasks show that agents trained with
PPP achieve substantial improvements over strong baselines such as GPT-5 (+21.6
on average), demonstrating the ability to ask strategic clarifying questions,
adapt to unseen user preferences, and improve task success through better
interaction. This work demonstrates that explicitly optimizing for
user-centered interaction is critical for building practical and effective AI
agents.
Ссылки и действия
Дополнительные ресурсы: