Frustratingly Easy Task-aware Pruning for Large Language Models
2510.22489v1
cs.CL, cs.LG
2025-10-29
Авторы:
Yuanhe Tian, Junjie Liu, Xican Yang, Haishan Ye, Yan Song
Abstract
Pruning provides a practical solution to reduce the resources required to run
large language models (LLMs) to benefit from their effective capabilities as
well as control their cost for training and inference. Research on LLM pruning
often ranks the importance of LLM parameters using their magnitudes and
calibration-data activations and removes (or masks) the less important ones,
accordingly reducing LLMs' size. However, these approaches primarily focus on
preserving the LLM's ability to generate fluent sentences, while neglecting
performance on specific domains and tasks. In this paper, we propose a simple
yet effective pruning approach for LLMs that preserves task-specific
capabilities while shrinking their parameter space. We first analyze how
conventional pruning minimizes loss perturbation under general-domain
calibration and extend this formulation by incorporating task-specific feature
distributions into the importance computation of existing pruning algorithms.
Thus, our framework computes separate importance scores using both general and
task-specific calibration data, partitions parameters into shared and exclusive
groups based on activation-norm differences, and then fuses their scores to
guide the pruning process. This design enables our method to integrate
seamlessly with various foundation pruning techniques and preserve the LLM's
specialized abilities under compression. Experiments on widely used benchmarks
demonstrate that our approach is effective and consistently outperforms the
baselines with identical pruning ratios and different settings.
Ссылки и действия
Дополнительные ресурсы: