Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
2510.06719v1
cs.CR, cs.CL, cs.LG
2025-10-10
Авторы:
Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by
grounding them in external knowledge. However, its application in sensitive
domains is limited by privacy risks. Existing private RAG methods typically
rely on query-time differential privacy (DP), which requires repeated noise
injection and leads to accumulated privacy loss. To address this issue, we
propose DP-SynRAG, a framework that uses LLMs to generate differentially
private synthetic RAG databases. Unlike prior methods, the synthetic text can
be reused once created, thereby avoiding repeated noise injection and
additional privacy costs. To preserve essential information for downstream RAG
tasks, DP-SynRAG extends private prediction, which instructs LLMs to generate
text that mimics subsampled database records in a DP manner. Experiments show
that DP-SynRAG achieves superior performanec to the state-of-the-art private
RAG systems while maintaining a fixed privacy budget, offering a scalable
solution for privacy-preserving RAG.
Ссылки и действия
Дополнительные ресурсы: