📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Gauri Deshpande, Harish Battula, Ashish Panda, Sunil Kumar Kopparapu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
This paper presents a unified study of four distinct modeling approaches for classifying dysarthria severity in the Speech Analysis for Neurodegenerative Diseases (SAND) challenge. All models tackle the same five class classification task using a common dataset of speech recordings. We investigate: (1) a ViT-OF method leveraging a Vision Transformer on spectrogram images, (2) a 1D-CNN approach using eight 1-D CNN's with majority-vote fusion, (3) a BiLSTM-OF approach using nine BiLSTM models with...
Авторы:
S M Asiful Islam Saky, Md Rashidul Islam, Md Saiful Arefin, Shahaba Alam
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Respiratory diseases remain major global health challenges, and traditional auscultation is often limited by subjectivity, environmental noise, and inter-clinician variability. This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals. The proposed system integrates two complementary representations: a spectral-temporal encoder based on a CNN-BiLSTM Attention architecture, and a handcrafted acoustic-feature encoder ...
Авторы:
Bruno Padovese, Fabio Frazao, Michael Dowd, Ruth Joy
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic complexity of real-world marine environments. Data augmentation has proven to be an effective strategy to address this limitation by increasing dataset diversity and improving model generalization without requiring additional field data. However, most augmentation techniques used to date rely on effective but rel...
Авторы:
Aaron Broukhim, Yiran Shen, Prithviraj Ammanabrolu, Nadir Weibel
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Despite the parallel challenges that audio and text domains face in evaluating generative model outputs, preference learning remains remarkably underexplored in audio applications. Through a PRISMA-guided systematic review of approximately 500 papers, we find that only 30 (6%) apply preference learning to audio tasks. Our analysis reveals a field in transition: pre-2021 works focused on emotion recognition using traditional ranking methods (rankSVM), while post-2021 studies have pivoted toward g...
Авторы:
Ali Boudaghi, Hadi Zare
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Music editing has emerged as an important and practical area of artificial
intelligence, with applications ranging from video game and film music
production to personalizing existing tracks according to user preferences.
However, existing models face significant limitations, such as being restricted
to editing synthesized music generated by their own models, requiring highly
precise prompts, or necessitating task-specific retraining, thus lacking true
zero-shot capability. Leveraging recent adva...
Авторы:
Bernardo Torres, Manuel Moussallam, Gabriel Meseguer-Brocal
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Audio autoencoders learn useful, compressed audio representations, but their
non-linear latent spaces prevent intuitive algebraic manipulation such as
mixing or scaling. We introduce a simple training methodology to induce
linearity in a high-compression Consistency Autoencoder (CAE) by using data
augmentation, thereby inducing homogeneity (equivariance to scalar gain) and
additivity (the decoder preserves addition) without altering the model's
architecture or loss function. When trained with ou...
Авторы:
Jing Yang, Sirui Wang, Chao Wu, Fan Fan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
We propose Schr\"odinger Bridge Mamba (SBM), a new concept of
training-inference framework motivated by the inherent compatibility between
Schr\"odinger Bridge (SB) training paradigm and selective state-space model
Mamba. We exemplify the concept of SBM with an implementation for generative
speech enhancement. Experiments on a joint denoising and dereverberation task
using four benchmark datasets demonstrate that SBM, with only 1-step inference,
outperforms strong baselines with 1-step or iterat...
📄 Beat Tracking as Object Detection
2025-10-20Авторы:
Jaehoon Ahn, Moon-Ryul Jung
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers)
output frame-level activations. We propose reframing this task as object
detection, where beats and downbeats are modeled as temporal "objects."
Adapting the FCOS detector from computer vision to 1D audio, we replace its
original backbone with WaveBeat's temporal feature extractor and add a Feature
Pyramid Network to capture multi-scale temporal patterns. The model predicts
overlapping beat/downbeat intervals with confiden...
📄 Beat Detection as Object Detection
2025-10-18Авторы:
Jaehoon Ahn, Moon-Ryul Jung
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Recent beat and downbeat tracking models (e.g., RNNs, TCNs, Transformers)
output frame-level activations. We propose reframing this task as object
detection, where beats and downbeats are modeled as temporal "objects."
Adapting the FCOS detector from computer vision to 1D audio, we replace its
original backbone with WaveBeat's temporal feature extractor and add a Feature
Pyramid Network to capture multi-scale temporal patterns. The model predicts
overlapping beat/downbeat intervals with confiden...
Авторы:
Alain Riou, Joan Serrà, Yuki Mitsufuji
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Sampling, the technique of reusing pieces of existing audio tracks to create
new music content, is a very common practice in modern music production. In
this paper, we tackle the challenging task of automatic sample identification,
that is, detecting such sampled content and retrieving the material from which
it originates. To do so, we adopt a self-supervised learning approach that
leverages a multi-track dataset to create positive pairs of artificial mixes,
and design a novel contrastive learn...
Показано 1 -
10
из 47 записей