Secret-Protected Evolution for Differentially Private Synthetic Text Generation
2510.10990v1
cs.CR, cs.CL, cs.NE
2025-10-15
Авторы:
Tianze Wang, Zhaoyu Chen, Jian Du, Yingtai Xiao, Linjun Zhang, Qiang Yan
Abstract
Text data has become extremely valuable on large language models (LLMs) and
even lead to general artificial intelligence (AGI). A lot of high-quality text
in the real world is private and cannot be freely used due to privacy concerns.
Therefore, differentially private (DP) synthetic text generation has been
proposed, aiming to produce high-utility synthetic data while protecting
sensitive information. However, existing DP synthetic text generation imposes
uniform guarantees that often overprotect non-sensitive content, resulting in
substantial utility loss and computational overhead. Therefore, we propose
Secret-Protected Evolution (SecPE), a novel framework that extends private
evolution with secret-aware protection. Theoretically, we show that SecPE
satisfies $(\mathrm{p}, \mathrm{r})$-secret protection, constituting a
relaxation of Gaussian DP that enables tighter utility-privacy trade-offs,
while also substantially reducing computational complexity relative to baseline
methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE
consistently achieves lower Fr\'echet Inception Distance (FID) and higher
downstream task accuracy than GDP-based Aug-PE baselines, while requiring less
noise to attain the same level of protection. Our results highlight that
secret-aware guarantees can unlock more practical and effective
privacy-preserving synthetic text generation.
Ссылки и действия
Дополнительные ресурсы: