HACK: Hallucinations Along Certainty and Knowledge Axes
2510.24222v1
cs.CL, I.2.7
2025-10-30
Авторы:
Adi Simhi, Jonathan Herzig, Itay Itzhak, Dana Arad, Zorik Gekhman, Roi Reichart, Fazl Barez, Gabriel Stanovsky, Idan Szpektor, Yonatan Belinkov
Abstract
Hallucinations in LLMs present a critical barrier to their reliable usage.
Existing research usually categorizes hallucination by their external
properties rather than by the LLMs' underlying internal properties. This
external focus overlooks that hallucinations may require tailored mitigation
strategies based on their underlying mechanism. We propose a framework for
categorizing hallucinations along two axes: knowledge and certainty. Since
parametric knowledge and certainty may vary across models, our categorization
method involves a model-specific dataset construction process that
differentiates between those types of hallucinations. Along the knowledge axis,
we distinguish between hallucinations caused by a lack of knowledge and those
occurring despite the model having the knowledge of the correct response. To
validate our framework along the knowledge axis, we apply steering mitigation,
which relies on the existence of parametric knowledge to manipulate model
activations. This addresses the lack of existing methods to validate knowledge
categorization by showing a significant difference between the two
hallucination types. We further analyze the distinct knowledge and
hallucination patterns between models, showing that different hallucinations do
occur despite shared parametric knowledge. Turning to the certainty axis, we
identify a particularly concerning subset of hallucinations where models
hallucinate with certainty despite having the correct knowledge internally. We
introduce a new evaluation metric to measure the effectiveness of mitigation
methods on this subset, revealing that while some methods perform well on
average, they fail disproportionately on these critical cases. Our findings
highlight the importance of considering both knowledge and certainty in
hallucination analysis and call for targeted mitigation approaches that
consider the hallucination underlying factors.
Ссылки и действия
Дополнительные ресурсы: