Detecting Data Contamination in LLMs via In-Context Learning
2510.27055v1
cs.CL, cs.AI, I.2.7
2025-11-04
Авторы:
Michał Zawalski, Meriem Boubdir, Klaudia Bałazy, Besmira Nushi, Pablo Ribalta
Abstract
We present Contamination Detection via Context (CoDeC), a practical and
accurate method to detect and quantify training data contamination in large
language models. CoDeC distinguishes between data memorized during training and
data outside the training distribution by measuring how in-context learning
affects model performance. We find that in-context examples typically boost
confidence for unseen datasets but may reduce it when the dataset was part of
training, due to disrupted memorization patterns. Experiments show that CoDeC
produces interpretable contamination scores that clearly separate seen and
unseen datasets, and reveals strong evidence of memorization in open-weight
models with undisclosed training corpora. The method is simple, automated, and
both model- and dataset-agnostic, making it easy to integrate with benchmark
evaluations.
Ссылки и действия
Дополнительные ресурсы: