MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers
2511.04376v1
cs.SD, cs.AI, cs.LG, cs.MM, eess.AS
2025-11-08
Авторы:
Ali Boudaghi, Hadi Zare
Abstract
Music editing has emerged as an important and practical area of artificial
intelligence, with applications ranging from video game and film music
production to personalizing existing tracks according to user preferences.
However, existing models face significant limitations, such as being restricted
to editing synthesized music generated by their own models, requiring highly
precise prompts, or necessitating task-specific retraining, thus lacking true
zero-shot capability. Leveraging recent advances in rectified flow and
diffusion transformers, we introduce MusRec, the first zero-shot text-to-music
editing model capable of performing diverse editing tasks on real-world music
efficiently and effectively. Experimental results demonstrate that our approach
outperforms existing methods in preserving musical content, structural
consistency, and editing fidelity, establishing a strong foundation for
controllable music editing in real-world scenarios.