Learning Relationships Between Separate Audio Tracks for Creative Applications
2509.25296v1
cs.SD, cs.AI, cs.HC, cs.LG, eess.AS
2025-10-02
Авторы:
Balthazar Bujard, Jérôme Nika, Fédéric Bevilacqua, Nicolas Obin
Abstract
This paper presents the first step in a research project situated within the
field of musical agents. The objective is to achieve, through training, the
tuning of the desired musical relationship between a live musical input and a
real-time generated musical output, through the curation of a database of
separated tracks. We propose an architecture integrating a symbolic decision
module capable of learning and exploiting musical relationships from such
musical corpus. We detail an offline implementation of this architecture
employing Transformers as the decision module, associated with a perception
module based on Wav2Vec 2.0, and concatenative synthesis as audio renderer. We
present a quantitative evaluation of the decision module's ability to reproduce
learned relationships extracted during training. We demonstrate that our
decision module can predict a coherent track B when conditioned by its
corresponding ''guide'' track A, based on a corpus of paired tracks (A, B).