The Model's Language Matters: A Comparative Privacy Analysis of LLMs

2510.08813v1 cs.CL, cs.CR 2025-10-14

Авторы:

Abhishek K. Mishra, Antoine Boutet, Lucas Magnana

Abstract

Large Language Models (LLMs) are increasingly deployed across multilingual applications that handle sensitive data, yet their scale and linguistic variability introduce major privacy risks. Mostly evaluated for English, this paper investigates how language structure affects privacy leakage in LLMs trained on English, Spanish, French, and Italian medical corpora. We quantify six linguistic indicators and evaluate three attack vectors: extraction, counterfactual memorization, and membership inference. Results show that privacy vulnerability scales with linguistic redundancy and tokenization granularity: Italian exhibits the strongest leakage, while English shows higher membership separability. In contrast, French and Spanish display greater resilience due to higher morphological complexity. Overall, our findings provide the first quantitative evidence that language matters in privacy leakage, underscoring the need for language-aware privacy-preserving mechanisms in LLM deployments.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

The Model's Language Matters: A Comparative Privacy Analysis of LLMs

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks ...

LLM Reinforcement in Context

RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-a...

HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detect...

EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

Навигация