SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
2510.00395v1
cs.SD, cs.AI, cs.LG, eess.AS
2025-10-04
Авторы:
Jiaye Tan, Haonan Luo, Linfeng Song, Shuaiqi Chen, Yishan Lyu, Zian Zhong, Roujia Wang, Daniel Jiang, Haoran Zhang, Jiaming Bai, Haoran Cheng, Q. Vera Liao, Hao-Wen Dong
Abstract
Low-latency symbolic music generation is essential for real-time
improvisation and human-AI co-creation. Existing transformer-based models,
however, face a trade-off between inference speed and musical quality.
Traditional acceleration techniques such as embedding pooling significantly
degrade quality, while recently proposed Byte Pair Encoding (BPE) methods -
though effective on single-track piano data - suffer large performance drops in
multi-track settings, as revealed by our analysis. We propose
Attribute-Specialized Key-Value Head Sharing (AS-KVHS), adapted to music's
structured symbolic representation, achieving about 30% inference speedup with
only a negligible (about 0.4%) quality drop in objective evaluations and slight
improvements in subjective listening tests. Our main contributions are (1) the
first systematic study of BPE's generalizability in multi-track symbolic music,
and (2) the introduction of AS-KVHS for low-latency symbolic music generation.
Beyond these, we also release SAGE-Music, an open-source benchmark that matches
or surpasses state-of-the-art models in generation quality.