Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
2511.02681v1
cs.CL, cs.AI, cs.LG
2025-11-06
Авторы:
Mohammadsajad Alipour, Mohammad Mohammadi Amiri
Abstract
Large language models (LLMs) are increasingly prevalent across diverse
applications. However, their enormous size limits storage and processing
capabilities to a few well-resourced stakeholders. As a result, most
applications rely on pre-trained LLMs, fine-tuned for specific tasks. However,
even storing the fine-tuned versions of these models remains a significant
challenge due to the wide range of tasks they address. Recently, studies show
that fine-tuning these models primarily affects a small fraction of parameters,
highlighting the need for more efficient storage of fine-tuned models. This
paper focuses on efficient storage of parameter updates in pre-trained models
after fine-tuning. To address this challenge, we leverage the observation that
fine-tuning updates are both low-rank and sparse, which can be utilized for
storage efficiency. However, using only low-rank approximation or
sparsification may discard critical singular components that enhance model
expressivity. We first observe that given the same memory budget, sparsified
low-rank approximations with larger ranks outperform standard low-rank
approximations with smaller ranks. Building on this, we propose our method,
optimal singular damage, that selectively sparsifies low-rank approximated
updates by leveraging the interleaved importance of singular vectors, ensuring
that the most impactful components are retained. We demonstrate through
extensive experiments that our proposed methods lead to significant storage
efficiency and superior accuracy within the same memory budget compared to
employing the low-rank approximation or sparsification individually.
Ссылки и действия
Дополнительные ресурсы: