Adapting Multilingual Models to Code-Mixed Tasks via Model Merging
2510.19782v2
cs.CL, I.2.7
2025-10-24
Авторы:
Prashant Kodali, Vaishnavi Shivkumar, Swarang Joshi, Monojit Choudhary, Ponnurangam Kumaraguru, Manish Shrivastava
Abstract
We study model merging as a practical alternative to conventional adaptation
strategies for code-mixed NLP. Starting from a multilingual base model, we: (i)
perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an
adapted checkpoint, (ii) merge checkpoint with the base model, and (iii)
fine-tune (FT) on the downstream task data. We evaluate our approach for
sentence classification (sentiment and hate speech) task in English-Hindi
(En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our
results show that merged models consistently outperform full fine-tuning and
CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2
points over CPT->FT, indicating that unlabeled data is leveraged more
effectively via merging than via CPT alone. Zero-/few-shot prompting with
larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged
checkpoints, underscoring limits of in-context learning for code-mixed inputs.
We further test cross-pair transfer by training on En-Hi and evaluating on
En-Ta and En-Ml: merged checkpoints transfer more strongly than
monolingual-English baselines (e.g., TV/TIES variants reaching 0.65-0.68 F1 vs
0.61-0.63 for full fine-tuning), suggesting that code-mixed knowledge is a more
reliable substrate for low-resource pairs. We conclude with adaptation recipes
matched to common data regimes (labeled only; labeled+unlabeled; transfer-only)
and discuss limitations and scaling considerations for broader tasks and larger
models.
Ссылки и действия
Дополнительные ресурсы: