Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
2511.05350v1
cs.SD, cs.AI
2025-11-11
Авторы:
Mathias Rose Bjare, Giorgia Cantisani, Marco Pasini, Stefan Lattner, Gerhard Widmer
Abstract
We argue that training autoencoders to reconstruct inputs from noised
versions of their encodings, when combined with perceptual losses, yields
encodings that are structured according to a perceptual hierarchy. We
demonstrate the emergence of this hierarchical structure by showing that, after
training an audio autoencoder in this manner, perceptually salient information
is captured in coarser representation structures than with conventional
training. Furthermore, we show that such perceptual hierarchies improve latent
diffusion decoding in the context of estimating surprisal in music pitches and
predicting EEG-brain responses to music listening. Pretrained weights are
available on github.com/CPJKU/pa-audioic.
Ссылки и действия
Дополнительные ресурсы: