MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding

2510.16273v1 cs.SD, cs.AI, eess.AS 2025-10-22

Авторы:

Jingyue Huang, Zachary Novack, Phillip Long, Yupeng Hou, Ke Chen, Taylor Berg-Kirkpatrick, Julian McAuley

Abstract

Discrete representation learning has shown promising results across various domains, including generation and understanding in image, speech and language. Inspired by these advances, we propose MuseTok, a tokenization method for symbolic music, and investigate its effectiveness in both music generation and understanding tasks. MuseTok employs the residual vector quantized-variational autoencoder (RQ-VAE) on bar-wise music segments within a Transformer-based encoder-decoder framework, producing music codes that achieve high-fidelity music reconstruction and accurate understanding of music theory. For comprehensive evaluation, we apply MuseTok to music generation and semantic understanding tasks, including melody extraction, chord recognition, and emotion recognition. Models incorporating MuseTok outperform previous representation learning baselines in semantic understanding while maintaining comparable performance in content generation. Furthermore, qualitative analyses on MuseTok codes, using ground-truth categories and synthetic datasets, reveal that MuseTok effectively captures underlying musical concepts from large music collections.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding

Авторы:

Abstract

Ссылки и действия

Связанные статьи

RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup an...

Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup ...

Aligning Generative Music AI with Human Preferences: Methods and Challenges

Real-Time Speech Enhancement via a Hybrid ViT: A Dual-Input Acoustic-Image Featu...

Навигация