On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

2511.19986v1 cs.LG, cs.AI, cs.CV 2025-11-26

Авторы:

Lianming Huang, Haibo Hu, Qiao Li, Nan Guan, Chun Jason Xue

Abstract

Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tasks to maximize overlap. By dynamically loading only the small differential set of blocks required for the next task, our method effectively mitigates the cold-start latency inherent in traditional monolithic approaches.Experiments on a real-world autonomous driving platform demonstrate that our framework achieves superior switching efficiency, accelerating task switching by over 6.6X on average compared to existing sparsity methods.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

Авторы:

Abstract

Ссылки и действия

Связанные статьи

TV2TV: A Unified Framework for Interleaved Language and Video Generation

The Universal Weight Subspace Hypothesis

STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Gra...

Open-Set Domain Adaptation Under Background Distribution Shift: Challenges and A...

First On-Orbit Demonstration of a Geospatial Foundation Model

Навигация