From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
2510.11056v1
cs.IR, cs.AI
2025-10-16
Авторы:
Runze Xia, Yupeng Ji, Yuxi Zhou, Haodong Liu, Teng Zhang, Piji Li
Abstract
Query-service relevance prediction in e-commerce search systems faces strict
latency requirements that prevent the direct application of Large Language
Models (LLMs). To bridge this gap, we propose a two-stage reasoning
distillation framework to transfer reasoning capabilities from a powerful
teacher LLM to a lightweight, deployment-friendly student model. In the first
stage, we address the limitations of general-purpose LLMs by constructing a
domain-adapted teacher model. This is achieved through a three-step process:
domain-adaptive pre-training to inject platform knowledge, supervised
fine-tuning to elicit reasoning skills, and preference optimization with a
multi-dimensional reward model to ensure the generation of reliable and
preference-aligned reasoning paths. This teacher can then automatically
annotate massive query-service pairs from search logs with both relevance
labels and reasoning chains. In the second stage, to address the challenges of
architectural heterogeneity in standard distillation, we introduce Contrastive
Reasoning Self-Distillation (CRSD). By modeling the behavior of the same
student model under "standard" and "reasoning-augmented" inputs as a
teacher-student relationship, CRSD enables the lightweight model to internalize
the teacher's complex decision-making mechanisms without needing the explicit
reasoning path at inference. Offline evaluations and online A/B testing in the
Meituan search advertising system demonstrate that our framework achieves
significant improvements across multiple metrics, validating its effectiveness
and practical value.
Ссылки и действия
Дополнительные ресурсы: