SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
2510.26769v1
cs.CV, cs.LG
2025-11-01
Авторы:
Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas
Abstract
This work introduces SteerVLM, a lightweight steering module designed to
guide Vision-Language Models (VLMs) towards outputs that better adhere to
desired instructions. Our approach learns from the latent embeddings of paired
prompts encoding target and converse behaviors to dynamically adjust
activations connecting the language modality with image context. This allows
for fine-grained, inference-time control over complex output semantics without
modifying model weights while preserving performance on off-target tasks. Our
steering module requires learning parameters equal to 0.14% of the original
VLM's size. Our steering module gains model control through dimension-wise
activation modulation and adaptive steering across layers without requiring
pre-extracted static vectors or manual tuning of intervention points.
Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a
multimodal dataset specifically created to facilitate the development and
evaluation of VLM steering techniques. Our method outperforms existing
intervention techniques on steering and hallucination mitigation benchmarks for
VLMs and proposes a robust solution for multimodal model control through
activation engineering.
Ссылки и действия
Дополнительные ресурсы: