VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion
2510.16446v1
cs.CV, cs.LG
2025-10-22
Авторы:
Jaekyun Park, Hye Won Chung
Abstract
In the era of large-scale foundation models, fully fine-tuning pretrained
networks for each downstream task is often prohibitively resource-intensive.
Prompt tuning offers a lightweight alternative by introducing tunable prompts
while keeping the backbone frozen. However, existing visual prompt tuning
methods often fail to specialize the prompts or enrich the representation
space--especially when applied to self-supervised backbones. We show that these
limitations become especially pronounced in challenging tasks and data-scarce
settings, where effective adaptation is most critical. In this work, we
introduce VIPAMIN, a visual prompt initialization strategy that enhances
adaptation of self-supervised models by (1) aligning prompts with semantically
informative regions in the embedding space, and (2) injecting novel
representational directions beyond the pretrained subspace. Despite its
simplicity--requiring only a single forward pass and lightweight
operations--VIPAMIN consistently improves performance across diverse tasks and
dataset sizes, setting a new state of the art in visual prompt tuning. Our code
is available at https://github.com/iamjaekyun/vipamin.
Ссылки и действия
Дополнительные ресурсы: