Elastic Architecture Search for Efficient Language Models
2510.27037v1
cs.CL, cs.AI, cs.LG, cs.NE
2025-11-04
Авторы:
Shang Wang
Abstract
As large pre-trained language models become increasingly critical to natural
language understanding (NLU) tasks, their substantial computational and memory
requirements have raised significant economic and environmental concerns.
Addressing these challenges, this paper introduces the Elastic Language Model
(ELM), a novel neural architecture search (NAS) method optimized for compact
language models. ELM extends existing NAS approaches by introducing a flexible
search space with efficient transformer blocks and dynamic modules for
dimension and head number adjustment. These innovations enhance the efficiency
and flexibility of the search process, which facilitates more thorough and
effective exploration of model architectures. We also introduce novel knowledge
distillation losses that preserve the unique characteristics of each block, in
order to improve the discrimination between architectural choices during the
search process. Experiments on masked language modeling and causal language
modeling tasks demonstrate that models discovered by ELM significantly
outperform existing methods.