How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models
2510.02453v1
cs.LG, cs.AI, cs.CL
2025-10-07
Авторы:
Parth Asawa, Alan Zhu, Matei Zaharia, Alexandros G. Dimakis, Joseph E. Gonzalez
Abstract
Foundation models are increasingly deployed as black-box services, where
model weights cannot be modified and customization is limited to prompting.
While static prompt optimization has shown promise, it produces a single fixed
prompt that fails to adapt to different inputs, users, or environments. We
introduce Advisor Models, lightweight parametric policies trained with
reinforcement learning to reactively issue natural language steering
instructions in-context to black-box models. The advisor is a second small
model that sits between the input and the model, shaping behavior on a
per-instance basis using reward signals from the environment. Across multiple
domains involving reasoning and personalization, we show that Advisor Models
outperform static prompt optimizers, discovering environment dynamics and
improving downstream task performance. We also demonstrate the generalizability
of advisors by transferring them across black-box models, as well as the
framework's ability to achieve specialization while retaining robustness to
out-of-distribution inputs. Viewed more broadly, Advisor Models provide a
learnable interface to black-box systems where the advisor acts as a
parametric, environment-specific memory. We argue that dynamic optimization of
black-box models via Advisor Models is a promising direction for enabling
personalization and environment-adaptable AI with frontier-level capabilities.
Ссылки и действия
Дополнительные ресурсы: