EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
2509.25495v1
cs.SD, cs.AI
2025-10-02
Авторы:
Jiacheng Shi, Hongfei Du, Y. Alicia Hong, Ye Gao
Abstract
Speech emotion recognition (SER) with audio-language models (ALMs) remains
vulnerable to distribution shifts at test time, leading to performance
degradation in out-of-domain scenarios. Test-time adaptation (TTA) provides a
promising solution but often relies on gradient-based updates or prompt tuning,
limiting flexibility and practicality. We propose Emo-TTA, a lightweight,
training-free adaptation framework that incrementally updates class-conditional
statistics via an Expectation-Maximization procedure for explicit test-time
distribution estimation, using ALM predictions as priors. Emo-TTA operates on
individual test samples without modifying model weights. Experiments on six
out-of-domain SER benchmarks show consistent accuracy improvements over prior
TTA baselines, demonstrating the effectiveness of statistical adaptation in
aligning model predictions with evolving test distributions.
Ссылки и действия
Дополнительные ресурсы: