Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics
2511.04527v1
cs.CL, cs.AI
2025-11-08
Авторы:
Amir Zur, Atticus Geiger, Ekdeep Singh Lubana, Eric Bigelow
Abstract
When a language model generates text, the selection of individual tokens
might lead it down very different reasoning paths, making uncertainty difficult
to quantify. In this work, we consider whether reasoning language models
represent the alternate paths that they could take during generation. To test
this hypothesis, we use hidden activations to control and predict a language
model's uncertainty during chain-of-thought reasoning. In our experiments, we
find a clear correlation between how uncertain a model is at different tokens,
and how easily the model can be steered by controlling its activations. This
suggests that activation interventions are most effective when there are
alternate paths available to the model -- in other words, when it has not yet
committed to a particular final answer. We also find that hidden activations
can predict a model's future outcome distribution, demonstrating that models
implicitly represent the space of possible paths.
Ссылки и действия
Дополнительные ресурсы: