Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
2510.16968v1
cs.LG, cs.AI, cs.CL
2025-10-22
Авторы:
Pingzhi Li, Morris Yu-Chao Huang, Zhen Tan, Qingquan Song, Jie Peng, Kai Zou, Yu Cheng, Kaidi Xu, Tianlong Chen
Abstract
Knowledge Distillation (KD) accelerates training of large language models
(LLMs) but poses intellectual property protection and LLM diversity risks.
Existing KD detection methods based on self-identity or output similarity can
be easily evaded through prompt engineering. We present a KD detection
framework effective in both white-box and black-box settings by exploiting an
overlooked signal: the transfer of MoE "structural habits", especially internal
routing patterns. Our approach analyzes how different experts specialize and
collaborate across various inputs, creating distinctive fingerprints that
persist through the distillation process. To extend beyond the white-box setup
and MoE architectures, we further propose Shadow-MoE, a black-box method that
constructs proxy MoE representations via auxiliary distillation to compare
these patterns between arbitrary model pairs. We establish a comprehensive,
reproducible benchmark that offers diverse distilled checkpoints and an
extensible framework to facilitate future research. Extensive experiments
demonstrate >94% detection accuracy across various scenarios and strong
robustness to prompt-based evasion, outperforming existing baselines while
highlighting the structural habits transfer in LLMs.
Ссылки и действия
Дополнительные ресурсы: