Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization
2510.23530v1
cs.SD, cs.AI, cs.LG, eess.AS
2025-10-29
Авторы:
Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal
Abstract
Audio autoencoders learn useful, compressed audio representations, but their
non-linear latent spaces prevent intuitive algebraic manipulation such as
mixing or scaling. We introduce a simple training methodology to induce
linearity in a high-compression Consistency Autoencoder (CAE) by using data
augmentation, thereby inducing homogeneity (equivariance to scalar gain) and
additivity (the decoder preserves addition) without altering the model's
architecture or loss function. When trained with our method, the CAE exhibits
linear behavior in both the encoder and decoder while preserving reconstruction
fidelity. We test the practical utility of our learned space on music source
composition and separation via simple latent arithmetic. This work presents a
straightforward technique for constructing structured latent spaces, enabling
more intuitive and efficient audio processing.