📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
2025-10-16Авторы:
Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Existing Persian speech datasets are typically smaller than their English
counterparts, which creates a key limitation for developing Persian speech
technologies. We address this gap by introducing ParsVoice, the largest Persian
speech corpus designed specifically for text-to-speech(TTS) applications. We
created an automated pipeline that transforms raw audiobook content into
TTS-ready data, incorporating components such as a BERT-based sentence
completion detector, a binary search boundary opti...
Авторы:
Upasana Tiwari, Rupayan Chakraborty, Sunil Kumar Kopparapu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Effectiveness of speech emotion recognition in real-world scenarios is often
hindered by noisy environments and variability across datasets. This paper
introduces a two-step approach to enhance the robustness and generalization of
speech emotion recognition models through improved representation learning.
First, our model employs EDRL (Emotion-Disentangled Representation Learning) to
extract class-specific discriminative features while preserving shared
similarities across emotion categories. Ne...
Авторы:
Balthazar Bujard, Jérôme Nika, Fédéric Bevilacqua, Nicolas Obin
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper presents the first step in a research project situated within the
field of musical agents. The objective is to achieve, through training, the
tuning of the desired musical relationship between a live musical input and a
real-time generated musical output, through the curation of a database of
separated tracks. We propose an architecture integrating a symbolic decision
module capable of learning and exploiting musical relationships from such
musical corpus. We detail an offline impleme...