Zero-shot image privacy classification with Vision-Language Models

2510.09253v1 cs.CV, cs.LG, cs.MM 2025-10-14

Авторы:

Alina Elena Baia, Alessio Xompero, Andrea Cavallaro

Abstract

While specialized learning-based models have historically dominated image privacy prediction, the current literature increasingly favours adopting large Vision-Language Models (VLMs) designed for generic tasks. This trend risks overlooking the performance ceiling set by purpose-built models due to a lack of systematic evaluation. To address this problem, we establish a zero-shot benchmark for image privacy classification, enabling a fair comparison. We evaluate the top-3 open-source VLMs, according to a privacy benchmark, using task-aligned prompts and we contrast their performance, efficiency, and robustness against established vision-only and multi-modal methods. Counter-intuitively, our results show that VLMs, despite their resource-intensive nature in terms of high parameter count and slower inference, currently lag behind specialized, smaller models in privacy prediction accuracy. We also find that VLMs exhibit higher robustness to image perturbations.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Zero-shot image privacy classification with Vision-Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Image...

DeCo-VAE: Learning Compact Latents for Video Reconstruction via Decoupled Repres...

Calibrated Multimodal Representation Learning with Missing Modalities

Post-surgical Endometriosis Segmentation in Laparoscopic Videos

MCE: Towards a General Framework for Handling Missing Modalities under Imbalance...

Навигация