📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 SAND Challenge: Four Approaches for Dysartria Severity Classification

2025-12-04

Авторы:

Gauri Deshpande, Harish Battula, Ashish Panda, Sunil Kumar Kopparapu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper presents a unified study of four distinct modeling approaches for classifying dysarthria severity in the Speech Analysis for Neurodegenerative Diseases (SAND) challenge. All models tackle the same five class classification task using a common dataset of speech recordings. We investigate: (1) a ViT-OF method leveraging a Vision Transformer on spectrogram images, (2) a 1D-CNN approach using eight 1-D CNN's with majority-vote fusion, (3) a BiLSTM-OF approach using nine BiLSTM models with...

ID: 2512.02669v1 cs.SD, cs.AI, cs.LG

arXiv PDF

📄 Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals

2025-12-02

Авторы:

S M Asiful Islam Saky, Md Rashidul Islam, Md Saiful Arefin, Shahaba Alam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals. The proposed system integrates two complementary representations: a spectral-temporal encoder based on a CNN-BiLSTM Attention architecture, and a handcrafted acoustic-feature encoder ...

ID: 2512.00563v1 cs.SD, cs.AI, cs.LG

arXiv PDF

📄 Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection

2025-12-01

Авторы:

Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but rel...

ID: 2511.21872v1 cs.SD, cs.AI, cs.LG, eess.AS

arXiv PDF

📄 Preference-Based Learning in Audio Applications: A Systematic Analysis

2025-11-19

Авторы:

Aaron Broukhim, Yiran Shen, Prithviraj Ammanabrolu, Nadir Weibel

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a PRISMA-guided systematic review of approximately 500 papers, we find that only 30 (6%) apply preference learning to audio tasks. Our analysis reveals a field in transition: pre-2021 works focused on emotion recognition using traditional ranking methods (rankSVM), while post-2021 studies have pivoted toward g...

ID: 2511.13936v1 cs.SD, cs.AI, cs.LG

arXiv PDF

📄 MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

2025-11-08

Авторы:

Ali Boudaghi, Hadi Zare

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Music editing has emerged as an important and practical area of artificial intelligence, with applications ranging from video game and film music production to personalizing existing tracks according to user preferences. However, existing models face significant limitations, such as being restricted to editing synthesized music generated by their own models, requiring highly precise prompts, or necessitating task-specific retraining, thus lacking true zero-shot capability. Leveraging recent adva...

ID: 2511.04376v1 cs.SD, cs.AI, cs.LG, cs.MM, eess.AS

arXiv PDF

📄 Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization

2025-10-29

Авторы:

Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Audio autoencoders learn useful, compressed audio representations, but their non-linear latent spaces prevent intuitive algebraic manipulation such as mixing or scaling. We introduce a simple training methodology to induce linearity in a high-compression Consistency Autoencoder (CAE) by using data augmentation, thereby inducing homogeneity (equivariance to scalar gain) and additivity (the decoder preserves addition) without altering the model's architecture or loss function. When trained with ou...

ID: 2510.23530v1 cs.SD, cs.AI, cs.LG, eess.AS

arXiv PDF

📄 Schrödinger Bridge Mamba for One-Step Speech Enhancement

2025-10-22

Авторы:

Jing Yang, Sirui Wang, Chao Wu, Fan Fan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We propose Schr\"odinger Bridge Mamba (SBM), a new concept of training-inference framework motivated by the inherent compatibility between Schr\"odinger Bridge (SB) training paradigm and selective state-space model Mamba. We exemplify the concept of SBM with an implementation for generative speech enhancement. Experiments on a joint denoising and dereverberation task using four benchmark datasets demonstrate that SBM, with only 1-step inference, outperforms strong baselines with 1-step or iterat...

ID: 2510.16834v1 cs.SD, cs.AI, cs.LG, eess.AS

arXiv PDF

📄 Beat Tracking as Object Detection

2025-10-20

Авторы:

Jaehoon Ahn, Moon-Ryul Jung

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers) output frame-level activations. We propose reframing this task as object detection, where beats and downbeats are modeled as temporal "objects." Adapting the FCOS detector from computer vision to 1D audio, we replace its original backbone with WaveBeat's temporal feature extractor and add a Feature Pyramid Network to capture multi-scale temporal patterns. The model predicts overlapping beat/downbeat intervals with confiden...

ID: 2510.14391v2 cs.SD, cs.AI, cs.LG

arXiv PDF

📄 Beat Detection as Object Detection

2025-10-18

Авторы:

Jaehoon Ahn, Moon-Ryul Jung

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2510.14391v1 cs.SD, cs.AI, cs.LG

arXiv PDF

📄 Automatic Music Sample Identification with Multi-Track Contrastive Learning

2025-10-15

Авторы:

Alain Riou, Joan Serrà, Yuki Mitsufuji

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learn...

ID: 2510.11507v1 cs.SD, cs.AI, cs.LG, eess.AS

arXiv PDF

Показано 1 - 10 из 47 записей