Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
2510.24208v1
cs.CL, cs.LG
2025-10-30
Авторы:
Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang
Abstract
Large Language Models (LLMs) encode vast amounts of knowledge in their
massive parameters, which is accessible to locate, trace, and analyze. Despite
advances in neural interpretability, it is still not clear how to transfer
knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT).
A key problem is enabling effective and efficient knowledge transfer across
LLMs of different scales, which is essential for achieving greater flexibility
and broader applicability in transferring knowledge between LLMs. Due to neural
incompatibility, referring to the architectural and parametric differences
between LLMs of varying scales, existing methods that directly reuse layer
parameters are severely limited. In this paper, we identify the semantic
alignment in latent space as the fundamental prerequisite for LLM cross-scale
knowledge transfer. Instead of directly using the layer parameters, our
approach takes activations as the medium of layer-wise knowledge transfer.
Leveraging the semantics in latent space, our approach is simple and
outperforms prior work, better aligning model behaviors across varying scales.
Evaluations on four benchmarks demonstrate the efficacy of our method. Further
analysis reveals the key factors easing cross-scale knowledge transfer and
provides insights into the nature of latent semantic alignment.
Ссылки и действия
Дополнительные ресурсы: