A Graph Signal Processing Framework for Hallucination Detection in Large Language Models
2510.19117v1
cs.CL, cs.LG, eess.SP, stat.ML
2025-10-24
Авторы:
Valentin Noël
Abstract
Large language models achieve impressive results but distinguishing factual
reasoning from hallucinations remains challenging. We propose a spectral
analysis framework that models transformer layers as dynamic graphs induced by
attention, with token embeddings as signals on these graphs. Through graph
signal processing, we define diagnostics including Dirichlet energy, spectral
entropy, and high-frequency energy ratios, with theoretical connections to
computational stability. Experiments across GPT architectures suggest universal
spectral patterns: factual statements exhibit consistent "energy mountain"
behavior with low-frequency convergence, while different hallucination types
show distinct signatures. Logical contradictions destabilize spectra with large
effect sizes ($g>1.0$), semantic errors remain stable but show connectivity
drift, and substitution hallucinations display intermediate perturbations. A
simple detector using spectral signatures achieves 88.75% accuracy versus 75%
for perplexity-based baselines, demonstrating practical utility. These findings
indicate that spectral geometry may capture reasoning patterns and error
behaviors, potentially offering a framework for hallucination detection in
large language models.