📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня
Авторы:

Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary opti...
ID: 2510.10774v2 cs.SD, cs.AI, cs.HC, cs.LG
Авторы:

Upasana Tiwari, Rupayan Chakraborty, Sunil Kumar Kopparapu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech emotion recognition models through improved representation learning. First, our model employs EDRL (Emotion-Disentangled Representation Learning) to extract class-specific discriminative features while preserving shared similarities across emotion categories. Ne...
ID: 2510.09072v1 cs.SD, cs.AI, cs.HC, cs.LG, eess.AS
Авторы:

Balthazar Bujard, Jérôme Nika, Fédéric Bevilacqua, Nicolas Obin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper presents the first step in a research project situated within the field of musical agents. The objective is to achieve, through training, the tuning of the desired musical relationship between a live musical input and a real-time generated musical output, through the curation of a database of separated tracks. We propose an architecture integrating a symbolic decision module capable of learning and exploiting musical relationships from such musical corpus. We detail an offline impleme...
ID: 2509.25296v1 cs.SD, cs.AI, cs.HC, cs.LG, eess.AS