Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
2510.25372v1
cs.CV, cs.LG
2025-10-31
Авторы:
M Yashwanth, Sharannya Ghosh, Aditay Tripathi, Anirban Chakraborty
Abstract
Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has
proven highly effective as a parameter-efficient fine-tuning technique for
adapting large models to downstream tasks with limited data. Its parameter
efficiency makes it particularly suitable for Federated Learning (FL), where
both communication and computation budgets are often constrained. However,
global prompt tuning struggles to generalize across heterogeneous clients,
while personalized tuning overfits to local data and lacks generalization. We
propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt
Tuning), a unified framework designed to achieve both generalization and
personalization in federated prompt tuning of ViTs. Within this framework, we
introduce the novel Class-Contextualized Mixed Prompt (CCMP) - based on
class-specific prompts maintained alongside a globally shared prompt. For each
input, CCMP adaptively combines class-specific prompts using weights derived
from global class prototypes and client class priors. This approach enables
per-sample prompt personalization without storing client-dependent trainable
parameters. The prompts are collaboratively optimized via traditional federated
averaging technique on the same. Comprehensive evaluations on CIFAR-100,
TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT
consistently surpasses the state-of-the-art baselines under diverse data
heterogeneity scenarios, establishing a strong foundation for efficient and
generalizable federated prompt tuning of Vision Transformers.
Ссылки и действия
Дополнительные ресурсы: