Lightweight and Generalizable Acoustic Scene Representations via Contrastive Fine-Tuning and Distillation
2510.03728v1
cs.SD, cs.LG, eess.AS, eess.SP
2025-10-08
Авторы:
Kuang Yuan, Yang Gao, Xilin Li, Xinhao Mei, Syavosh Zadissa, Tarun Pruthi, Saeed Bagheri Sereshki
Abstract
Acoustic scene classification (ASC) models on edge devices typically operate
under fixed class assumptions, lacking the transferability needed for
real-world applications that require adaptation to new or refined acoustic
categories. We propose ContrastASC, which learns generalizable acoustic scene
representations by structuring the embedding space to preserve semantic
relationships between scenes, enabling adaptation to unseen categories without
retraining. Our approach combines supervised contrastive fine-tuning of
pre-trained models with contrastive representation distillation to transfer
this structured knowledge to compact student models. Our evaluation shows that
ContrastASC demonstrates improved few-shot adaptation to unseen categories
while maintaining strong closed-set performance.