Zero-shot image privacy classification with Vision-Language Models
2510.09253v1
cs.CV, cs.LG, cs.MM
2025-10-14
Авторы:
Alina Elena Baia, Alessio Xompero, Andrea Cavallaro
Abstract
While specialized learning-based models have historically dominated image
privacy prediction, the current literature increasingly favours adopting large
Vision-Language Models (VLMs) designed for generic tasks. This trend risks
overlooking the performance ceiling set by purpose-built models due to a lack
of systematic evaluation. To address this problem, we establish a zero-shot
benchmark for image privacy classification, enabling a fair comparison. We
evaluate the top-3 open-source VLMs, according to a privacy benchmark, using
task-aligned prompts and we contrast their performance, efficiency, and
robustness against established vision-only and multi-modal methods.
Counter-intuitively, our results show that VLMs, despite their
resource-intensive nature in terms of high parameter count and slower
inference, currently lag behind specialized, smaller models in privacy
prediction accuracy. We also find that VLMs exhibit higher robustness to image
perturbations.
Ссылки и действия
Дополнительные ресурсы: