Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
2509.24603v1
cs.SD, cs.CV
2025-10-01
Авторы:
Tianle Wang, Sirui Zhang, Xinyi Tong, Peiyang Yu, Jishang Chen, Liangke Zhao, Xinpu Gao, Yves Zhu, Tiezheng Ge, Bo Zheng, Duo Xu, Yang Liu, Xin Jin, Feng Yu, Songchun Zhu
Abstract
This paper presents an unsupervised machine learning algorithm that
identifies recurring patterns -- referred to as ``music-words'' -- from
symbolic music data. These patterns are fundamental to musical structure and
reflect the cognitive processes involved in composition. However, extracting
these patterns remains challenging because of the inherent semantic ambiguity
in musical interpretation. We formulate the task of music-word discovery as a
statistical optimization problem and propose a two-stage
Expectation-Maximization (EM)-based learning framework: 1. Developing a
music-word dictionary; 2. Reconstructing the music data. When evaluated against
human expert annotations, the algorithm achieved an Intersection over Union
(IoU) score of 0.61. Our findings indicate that minimizing code length
effectively addresses semantic ambiguity, suggesting that human optimization of
encoding systems shapes musical semantics. This approach enables computers to
extract ``basic building blocks'' from music data, facilitating structural
analysis and sparse encoding. The method has two primary applications. First,
in AI music, it supports downstream tasks such as music generation,
classification, style transfer, and improvisation. Second, in musicology, it
provides a tool for analyzing compositional patterns and offers insights into
the principle of minimal encoding across diverse musical styles and composers.
Ссылки и действия
Дополнительные ресурсы: