SyMuPe: Affective and Controllable Symbolic Music Performance
2511.03425v1
cs.SD, cs.LG, cs.MM
2025-11-07
Авторы:
Ilya Borovik, Dmitrii Gavrilev, Vladimir Viro
Abstract
Emotions are fundamental to the creation and perception of music
performances. However, achieving human-like expression and emotion through
machine learning models for performance rendering remains a challenging task.
In this work, we present SyMuPe, a novel framework for developing and training
affective and controllable symbolic piano performance models. Our flagship
model, PianoFlow, uses conditional flow matching trained to solve diverse
multi-mask performance inpainting tasks. By design, it supports both
unconditional generation and infilling of music performance features. For
training, we use a curated, cleaned dataset of 2,968 hours of aligned musical
scores and expressive MIDI performances. For text and emotion control, we
integrate a piano performance emotion classifier and tune PianoFlow with the
emotion-weighted Flan-T5 text embeddings provided as conditional inputs.
Objective and subjective evaluations against transformer-based baselines and
existing models show that PianoFlow not only outperforms other approaches, but
also achieves performance quality comparable to that of human-recorded and
transcribed MIDI samples. For emotion control, we present and analyze samples
generated under different text conditioning scenarios. The developed model can
be integrated into interactive applications, contributing to the creation of
more accessible and engaging music performance systems.
Ссылки и действия
Дополнительные ресурсы: