MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
2511.03942v1
cs.SD, cs.CL, cs.MM
2025-11-08
Авторы:
Shih-Lun Wu, Yoon Kim, Cheng-Zhi Anna Huang
Abstract
We present MIDI-LLM, an LLM for generating multitrack MIDI music from
free-form text prompts. Our approach expands a text LLM's vocabulary to include
MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI
abilities. By preserving the original LLM's parameter structure, we can
directly leverage the vLLM library for accelerated inference. Experiments show
that MIDI-LLM achieves higher quality, better text control, and faster
inference compared to the recent Text2midi model. Live demo at
https://midi-llm-demo.vercel.app.
Ссылки и действия
Дополнительные ресурсы: