Elastic Architecture Search for Efficient Language Models

2510.27037v1 cs.CL, cs.AI, cs.LG, cs.NE 2025-11-04

Авторы:

Shang Wang

Abstract

As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significant economic and environmental concerns. Addressing these challenges, this paper introduces the Elastic Language Model (ELM), a novel neural architecture search (NAS) method optimized for compact language models. ELM extends existing NAS approaches by introducing a flexible search space with efficient transformer blocks and dynamic modules for dimension and head number adjustment. These innovations enhance the efficiency and flexibility of the search process, which facilitates more thorough and effective exploration of model architectures. We also introduce novel knowledge distillation losses that preserve the unique characteristics of each block, in order to improve the discrimination between architectural choices during the search process. Experiments on masked language modeling and causal language modeling tasks demonstrate that models discovered by ELM significantly outperform existing methods.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Elastic Architecture Search for Efficient Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Unravelling the Mechanisms of Manipulating Numbers in Language Models

Understanding Textual Emotion Through Emoji Prediction

Навигация