Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
2511.01952v1
cs.CR, cs.AI
2025-11-06
Авторы:
Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi
Abstract
Large vision-language models (LVLMs) derive their capabilities from extensive
training on vast corpora of visual and textual data. Empowered by large-scale
parameters, these models often exhibit strong memorization of their training
data, rendering them susceptible to membership inference attacks (MIAs).
Existing MIA methods for LVLMs typically operate under white- or gray-box
assumptions, by extracting likelihood-based features for the suspected data
samples based on the target LVLMs. However, mainstream LVLMs generally only
expose generated outputs while concealing internal computational features
during inference, limiting the applicability of these methods. In this work, we
propose the first black-box MIA framework for LVLMs, based on a prior
knowledge-calibrated memory probing mechanism. The core idea is to assess the
model memorization of the private semantic information embedded within the
suspected image data, which is unlikely to be inferred from general world
knowledge alone. We conducted extensive experiments across four LVLMs and three
datasets. Empirical results demonstrate that our method effectively identifies
training data of LVLMs in a purely black-box setting and even achieves
performance comparable to gray-box and white-box methods. Further analysis
reveals the robustness of our method against potential adversarial
manipulations, and the effectiveness of the methodology designs. Our code and
data are available at https://github.com/spmede/KCMP.
Ссылки и действия
Дополнительные ресурсы: