The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers
2510.16122v1
cs.CR, cs.CL, cs.LG, stat.ML
2025-10-22
Авторы:
Owais Makroo, Siva Rajesh Kasa, Sumegh Roychowdhury, Karan Gupta, Nikhil Pattisapu, Santhosh Kasa, Sumit Negi
Abstract
Membership Inference Attacks (MIAs) pose a critical privacy threat by
enabling adversaries to determine whether a specific sample was included in a
model's training dataset. Despite extensive research on MIAs, systematic
comparisons between generative and discriminative classifiers remain limited.
This work addresses this gap by first providing theoretical motivation for why
generative classifiers exhibit heightened susceptibility to MIAs, then
validating these insights through comprehensive empirical evaluation. Our study
encompasses discriminative, generative, and pseudo-generative text classifiers
across varying training data volumes, evaluated on nine benchmark datasets.
Employing a diverse array of MIA strategies, we consistently demonstrate that
fully generative classifiers which explicitly model the joint likelihood
$P(X,Y)$ are most vulnerable to membership leakage. Furthermore, we observe
that the canonical inference approach commonly used in generative classifiers
significantly amplifies this privacy risk. These findings reveal a fundamental
utility-privacy trade-off inherent in classifier design, underscoring the
critical need for caution when deploying generative classifiers in
privacy-sensitive applications. Our results motivate future research directions
in developing privacy-preserving generative classifiers that can maintain
utility while mitigating membership inference vulnerabilities.