ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios
2510.10625v1
cs.LG, cs.CR, cs.CV
2025-10-15
Авторы:
Yuval Golbari, Navve Wasserman, Gal Vardi, Michal Irani
Abstract
Determining which data samples were used to train a model-known as Membership
Inference Attack (MIA)-is a well-studied and important problem with
implications for data privacy. Black-box methods presume access only to the
model's outputs and often rely on training auxiliary reference models. While
they have shown strong empirical performance, they rely on assumptions that
rarely hold in real-world settings: (i) the attacker knows the training
hyperparameters; (ii) all available non-training samples come from the same
distribution as the training data; and (iii) the fraction of training data in
the evaluation set is known. In this paper, we demonstrate that removing these
assumptions leads to a significant drop in the performance of black-box
attacks. We introduce ImpMIA, a Membership Inference Attack that exploits the
Implicit Bias of neural networks, hence removes the need to rely on any
reference models and their assumptions. ImpMIA is a white-box attack -- a
setting which assumes access to model weights and is becoming increasingly
realistic given that many models are publicly available (e.g., via Hugging
Face). Building on maximum-margin implicit bias theory, ImpMIA uses the
Karush-Kuhn-Tucker (KKT) optimality conditions to identify training samples.
This is done by finding the samples whose gradients most strongly reconstruct
the trained model's parameters. As a result, ImpMIA achieves state-of-the-art
performance compared to both black and white box attacks in realistic settings
where only the model weights and a superset of the training data are available.
Ссылки и действия
Дополнительные ресурсы: