LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS
2511.02089v1
cs.LG, cs.CL
2025-11-06
Авторы:
Stefan F. Schouten, Peter Bloem
Abstract
Contrast-Consistent Search (CCS) is an unsupervised probing method able to
test whether large language models represent binary features, such as sentence
truth, in their internal activations. While CCS has shown promise, its two-term
objective has been only partially understood. In this work, we revisit CCS with
the aim of clarifying its mechanisms and extending its applicability. We argue
that what should be optimized for, is relative contrast consistency. Building
on this insight, we reformulate CCS as an eigenproblem, yielding closed-form
solutions with interpretable eigenvalues and natural extensions to multiple
variables. We evaluate these approaches across a range of datasets, finding
that they recover similar performance to CCS, while avoiding problems around
sensitivity to random initialization. Our results suggest that relativizing
contrast consistency not only improves our understanding of CCS but also opens
pathways for broader probing and mechanistic interpretability methods.
Ссылки и действия
Дополнительные ресурсы: