MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification

2511.11006v1 cs.SD, cs.AI 2025-11-17

Авторы:

HongYu Liu, Ruijie Wan, Yueju Han, Junxin Li, Liuxing Lu, Chao He, Lihua Cai

Abstract

Audio classification plays an essential role in sentiment analysis and emotion recognition, especially for analyzing customer attitudes in marketing phone calls. Efficiently categorizing customer purchasing propensity from large volumes of audio data remains challenging. In this work, we propose a novel Multi-Segment Multi-Task Fusion Network (MSMT-FN) that is uniquely designed for addressing this business demand. Evaluations conducted on our proprietary MarketCalls dataset, as well as established benchmarks (CMU-MOSI, CMU-MOSEI, and MELD), show MSMT-FN consistently outperforms or matches state-of-the-art methods. Additionally, our newly curated MarketCalls dataset will be available upon request, and the code base is made accessible at GitHub Repository MSMT-FN, to facilitate further research and advancements in audio classification domain.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Large Speech Model Enabled Semantic Communication

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-...

YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GR...

Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio ...

State Space Models for Bioacoustics: A comparative Evaluation with Transformers

Навигация