First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
2511.04715v1
cs.CL, cs.AI, cs.LG
2025-11-11
Авторы:
Dmytro Vitel, Anshuman Chhabra
Abstract
Identifying how training samples influence/impact Large Language Model (LLM)
decision-making is essential for effectively interpreting model decisions and
auditing large-scale datasets. Current training sample influence estimation
methods (also known as influence functions) undertake this goal by utilizing
information flow through the model via its first-order and higher-order
gradient terms. However, owing to the large model sizes of today consisting of
billions of parameters, these influence computations are often restricted to
some subset of model layers to ensure computational feasibility. Prior seminal
work by Yeh et al. (2022) in assessing which layers are best suited for
computing language data influence concluded that the first (embedding) layers
are the most informative for this purpose, using a hypothesis based on
influence scores canceling out (i.e., the cancellation effect). In this work,
we propose theoretical and empirical evidence demonstrating how the
cancellation effect is unreliable, and that middle attention layers are better
estimators for influence. Furthermore, we address the broader challenge of
aggregating influence scores across layers, and showcase how alternatives to
standard averaging (such as ranking and vote-based methods) can lead to
significantly improved performance. Finally, we propose better methods for
evaluating influence score efficacy in LLMs without undertaking model
retraining, and propose a new metric known as the Noise Detection Rate (NDR)
that exhibits strong predictive capability compared to the cancellation effect.
Through extensive experiments across LLMs of varying types and scales, we
concretely determine that the first (layers) are not necessarily better than
the last (layers) for LLM influence estimation, contrasting with prior
knowledge in the field.
Ссылки и действия
Дополнительные ресурсы: