📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding

2025-11-18

Авторы:

Yifan Zhuang, Calvin Huang, Zepeng Yu, Yongjie Zou, Jiawei Ju

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI deco...

ID: 2511.10935v1 cs.SD, cs.LG, q-bio.NC

arXiv PDF

📄 SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition

2025-11-15

Авторы:

Jiaqi Wang, Liutao Yu, Xiongri Shen, Sihang Guo, Chenlin Zhou, Leilei Zhao, Yi Zhong, Zhiguo Zhang, Zhengyu Ma

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Spiking neural networks (SNNs) offer a promising path toward energy-efficient speech command recognition (SCR) by leveraging their event-driven processing paradigm. However, existing SNN-based SCR methods often struggle to capture rich temporal dependencies and contextual information from speech due to limited temporal modeling and binary spike-based representations. To address these challenges, we first introduce the multi-view spiking temporal-aware self-attention (MSTASA) module, which combin...

ID: 2511.07883v2 cs.SD, cs.LG

arXiv PDF

📄 Speech Emotion Recognition with Phonation Excitation Information and Articulatory Kinematics

2025-11-15

Авторы:

Ziqian Zhang, Min Huang, Zhongzhe Xiao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Speech emotion recognition (SER) has advanced significantly for the sake of deep-learning methods, while textual information further enhances its performance. However, few studies have focused on the physiological information during speech production, which also encompasses speaker traits, including emotional states. To bridge this gap, we conducted a series of experiments to investigate the potential of the phonation excitation information and articulatory kinematics for SER. Due to the scarcit...

ID: 2511.07955v1 cs.SD, cs.LG

arXiv PDF

📄 Uncertainty Calibration of Multi-Label Bird Sound Classifiers

2025-11-15

Авторы:

Raphael Schwinger, Ben McEwen, Vincent S. Kather, René Heinrich, Lukas Rauch, Sven Tomforde

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Passive acoustic monitoring enables large-scale biodiversity assessment, but reliable classification of bioacoustic sounds requires not only high accuracy but also well-calibrated uncertainty estimates to ground decision-making. In bioacoustics, calibration is challenged by overlapping vocalisations, long-tailed species distributions, and distribution shifts between training and deployment data. The calibration of multi-label deep learning classifiers within the domain of bioacoustics has not ye...

ID: 2511.08261v1 cs.SD, cs.LG

arXiv PDF

📄 SyMuPe: Affective and Controllable Symbolic Music Performance

2025-11-07

Авторы:

Ilya Borovik, Dmitrii Gavrilev, Vladimir Viro

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Emotions are fundamental to the creation and perception of music performances. However, achieving human-like expression and emotion through machine learning models for performance rendering remains a challenging task. In this work, we present SyMuPe, a novel framework for developing and training affective and controllable symbolic piano performance models. Our flagship model, PianoFlow, uses conditional flow matching trained to solve diverse multi-mask performance inpainting tasks. By design, it...

ID: 2511.03425v1 cs.SD, cs.LG, cs.MM

arXiv PDF

📄 ADNAC: Audio Denoiser using Neural Audio Codec

2025-11-06

Авторы:

Daniel Jimon, Mircea Vaida, Adriana Stan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Audio denoising is critical in signal processing, enhancing intelligibility and fidelity for applications like restoring musical recordings. This paper presents a proof-of-concept for adapting a state-of-the-art neural audio codec, the Descript Audio Codec (DAC), for music denoising. This work overcomes the limitations of traditional architectures like U-Nets by training the model on a large-scale, custom-synthesized dataset built from diverse sources. Training is guided by a multi objective los...

ID: 2511.01773v1 cs.SD, cs.LG

arXiv PDF

📄 Representing Classical Compositions through Implication-Realization Temporal-Gestalt Graphs

2025-11-04

Авторы:

A. V. Bomediano, R. J. Conanan, L. D. Santuyo, A. Coronel

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Understanding the structural and cognitive underpinnings of musical compositions remains a key challenge in music theory and computational musicology. While traditional methods focus on harmony and rhythm, cognitive models such as the Implication-Realization (I-R) model and Temporal Gestalt theory offer insight into how listeners perceive and anticipate musical structure. This study presents a graph-based computational approach that operationalizes these models by segmenting melodies into percep...

ID: 2510.27530v1 cs.SD, cs.LG, cs.SI, H.5.5; G.2.2; I.5.4

arXiv PDF

📄 AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch

2025-10-24

Авторы:

Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, Tissa Chandesa

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent foundational models, SSAST, EAT, HuBERT, Qwen-Audio, and Audio Flamingo, achieve top-tier results across standard audio benchmarks but are limited by fixed input rates and durations, hindering their reusability. This paper introduces the Augmentation-driven Multiview Audio Transformer (AMAuT), a training-from-scratch framework that eliminates the dependency on pre-trained weights while supporting arbitrary sample rates and audio lengths. AMAuT integrates four key components: (1) augmentat...

ID: 2510.19368v1 cs.SD, cs.LG

arXiv PDF

📄 Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware

2025-10-23

Авторы:

Stavros Mitsis, Ermos Hadjikyriakos, Humaid Ibrahim, Savvas Neofytou, Shashwat Raman, James Myles, Eiman Kanjo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Deploying emotion recognition systems in real-world environments where devices must be small, low-power, and private remains a significant challenge. This is especially relevant for applications such as tension monitoring, conflict de-escalation, and responsive wearables, where cloud-based solutions are impractical. Multimodal emotion recognition has advanced through deep learning, but most systems remain unsuitable for deployment on ultra-constrained edge devices. Prior work typically relies on...

ID: 2510.18036v1 cs.SD, cs.LG, eess.AS

arXiv PDF

📄 AWARE: Audio Watermarking with Adversarial Resistance to Edits

2025-10-22

Авторы:

Kosta Pavlović, Lazar Stanarević, Petar Nedić, Slavko Kovačević, Igor Djurović

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Prevailing practice in learning-based audio watermarking is to pursue robustness by expanding the set of simulated distortions during training. However, such surrogates are narrow and prone to overfitting. This paper presents AWARE (Audio Watermarking with Adversarial Resistance to Edits), an alternative approach that avoids reliance on attack-simulation stacks and handcrafted differentiable distortions. Embedding is obtained via adversarial optimization in the time-frequency domain under a leve...

ID: 2510.17512v1 cs.SD, cs.LG, cs.MM, eess.AS

arXiv PDF

Показано 11 - 20 из 80 записей