Subject or Style: Adaptive and Training-Free Mixture of LoRAs

2508.02165v1 cs.CV, cs.CL 2025-08-09

Авторы:

Jia-Chen Zhang, Yu-Jie Xiong

Резюме на русском

Наилучшими способами оптимизировать генеративные модели для задач стилизации или подстановки субъекта являются методы тюнинга через Low-Rank Adaptation (LoRA). Однако существующие подходы часто сбивают баланс между темой и стилем и требуют дополнительного тренировочного процесса. В статье предлагается EST-LoRA — метод, который устраняет эти ограничения, являясь тренировочно-свободным и адаптивным. Он использует три ключевые фактора: емкость матрицы, дискретизацию стилей и временные шаги, чтобы адаптивно выбирать между стильным и тематическим LoRA в каждом слое. Подобно архитектуре Mixture of Experts (MoE), EST-LoRA обеспечивает баланс вкладов и улучшает качество генерации. Эксперименты показали, что он превосходит текущие подходы в качестве и скорости генерации, при этом требуя меньшего количества дополнительных параметров. Исходный код доступен по ссылке.

Abstract

Fine-tuning models via Low-Rank Adaptation (LoRA) demonstrates remarkable performance in subject-driven or style-driven generation tasks. Studies have explored combinations of different LoRAs to jointly generate learned styles and content. However, current methods struggle to balance the original subject and style, and often require additional training. Recently, K-LoRA proposed a training-free LoRA fusion method. But it involves multiple hyperparameters, making it difficult to adapt to all styles and subjects. In this paper, we propose EST-LoRA, a training-free adaptive LoRA fusion method. It comprehensively considers three critical factors: \underline{E}nergy of matrix, \underline{S}tyle discrepancy scores and \underline{T}ime steps. Analogous to the Mixture of Experts (MoE) architecture, the model adaptively selects between subject LoRA and style LoRA within each attention layer. This integrated selection mechanism ensures balanced contributions from both components during the generation process. Experimental results show that EST-LoRA outperforms state-of-the-art methods in both qualitative and quantitative evaluations and achieves faster generation speed compared to other efficient fusion approaches. Our code is publicly available at: https://anonymous.4open.science/r/EST-LoRA-F318.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Subject or Style: Adaptive and Training-Free Mixture of LoRAs

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality...

Generalized Medical Phrase Grounding

CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on...

Thinking with Programming Vision: Towards a Unified View for Thinking with Image...

See, Think, Learn: A Self-Taught Multimodal Reasoner

Навигация