Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

2510.23530v1 cs.SD, cs.AI, cs.LG, eess.AS 2025-10-29

Авторы:

Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal

Abstract

Audio autoencoders learn useful, compressed audio representations, but their non-linear latent spaces prevent intuitive algebraic manipulation such as mixing or scaling. We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder (CAE) by using data augmentation, thereby inducing homogeneity (equivariance to scalar gain) and additivity (the decoder preserves addition) without altering the model's architecture or loss function. When trained with our method, the CAE exhibits linear behavior in both the encoder and decoder while preserving reconstruction fidelity. We test the practical utility of our learned space on music source composition and separation via simple latent arithmetic. This work presents a straightforward technique for constructing structured latent spaces, enabling more intuitive and efficient audio processing.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation...

Schrödinger Bridge Mamba for One-Step Speech Enhancement

Automatic Music Sample Identification with Multi-Track Contrastive Learning

Leveraging Whisper Embeddings for Audio-based Lyrics Matching

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-...

Навигация