📊 Статистика дайджестов

Всего дайджестов: 35039 Добавлено сегодня: 432

Последнее обновление: сегодня

📄 PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

2025-12-04

Авторы:

Heng Xie, Kang Zhu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Ruibo Fu, Changsheng Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal sentiment analysis (MSA) is a research field that recognizes human sentiments by combining textual, visual, and audio modalities. The main challenge lies in integrating sentiment-related information from different modalities, which typically arises during the unimodal feature extraction phase and the multimodal feature fusion phase. Existing methods extract only shallow information from unimodal features during the extraction phase, neglecting sentimental differences across different ...

ID: 2512.01442v1 cs.MM, cs.AI

arXiv PDF

📄 Real-Time Mobile Video Analytics for Pre-arrival Emergency Medical Services

2025-11-20

Авторы:

Liuyi Jin, Amran Haroon, Radu Stoleru, Pasan Gunawardena, Michael Middleton, Jeeeun Kim

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Timely and accurate pre-arrival video streaming and analytics are critical for emergency medical services (EMS) to deliver life-saving interventions. Yet, current-generation EMS infrastructure remains constrained by one-to-one video streaming and limited analytics capabilities, leaving dispatchers and EMTs to manually interpret overwhelming, often noisy or redundant information in high-stress environments. We present TeleEMS, a mobile live video analytics system that enables pre-arrival multimod...

ID: 2511.14119v1 cs.MM, cs.AI

arXiv PDF

📄 ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation

2025-11-19

Авторы:

Jiahui Sun, Weining Wang, Mingzhen Sun, Yirong Yang, Xinxin Zhu, Jing Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sounding Video Generation (SVG) remains a challenging task due to the inherent structural misalignment between audio and video, as well as the high computational cost of multimodal data processing. In this paper, we introduce ProAV-DiT, a Projected Latent Diffusion Transformer designed for efficient and synchronized audio-video generation. To address structural inconsistencies, we preprocess raw audio into video-like representations, aligning both the temporal and spatial dimensions between audi...

ID: 2511.12072v1 cs.MM, cs.AI, cs.SD

arXiv PDF

📄 Can LLMs Create Legally Relevant Summaries and Analyses of Videos?

2025-11-19

Авторы:

Lyra Hoeben-Kuil, Gijs van Dijck, Jaromir Savelka, Johanna Gunawan, Konrad Kollnig, Marta Kolacz, Mindy Duffourc, Shashank Chakravarthy, Hannes Westermann

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Understanding the legally relevant factual basis of an event and conveying it through text is a key skill of legal professionals. This skill is important for preparing forms (e.g., insurance claims) or other legal documents (e.g., court claims), but often presents a challenge for laypeople. Current AI approaches aim to bridge this gap, but mostly rely on the user to articulate what has happened in text, which may be challenging for many. Here, we investigate the capability of large language mode...

ID: 2511.13772v1 cs.MM, cs.AI, cs.CV, cs.CY

arXiv PDF

📄 SynthGuard: An Open Platform for Detecting AI-Generated Multimedia with Multimodal LLMs

2025-11-19

Авторы:

Shail Desai, Aditya Pawar, Li Lin, Xin Wang, Shu Hu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Artificial Intelligence (AI) has made it possible for anyone to create images, audio, and video with unprecedented ease, enriching education, communication, and creative expression. At the same time, the rapid rise of AI-generated media has introduced serious risks, including misinformation, identity misuse, and the erosion of public trust as synthetic content becomes increasingly indistinguishable from real media. Although deepfake detection has advanced, many existing tools remain closed-sourc...

ID: 2511.12404v1 cs.MM, cs.AI, cs.SD

arXiv PDF

📄 Robustness and Imperceptibility Analysis of Hybrid Spatial-Frequency Domain Image Watermarking

2025-11-15

Авторы:

Rizal Khoirul Anam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The proliferation of digital media necessitates robust methods for copyright protection and content authentication. This paper presents a comprehensive comparative study of digital image watermarking techniques implemented using the spatial domain (Least Significant Bit - LSB), the frequency domain (Discrete Fourier Transform - DFT), and a novel hybrid (LSB+DFT) approach. The core objective is to evaluate the trade-offs between imperceptibility (measured by Peak Signal-to-Noise Ratio - PSNR) and...

ID: 2511.10245v1 cs.MM, cs.AI, eess.IV

arXiv PDF

📄 On the Brittleness of CLIP Text Encoders

2025-11-11

Авторы:

Allie Tran, Luca Rossetto

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal co-embedding models, especially CLIP, have advanced the state of the art in zero-shot classification and multimedia information retrieval in recent years by aligning images and text in a shared representation space. However, such modals trained on a contrastive alignment can lack stability towards small input perturbations. Especially when dealing with manually expressed queries, minor variations in the query can cause large differences in the ranking of the best-matching results. In ...

ID: 2511.04247v2 cs.MM, cs.AI, cs.IR

arXiv PDF

📄 On the Brittleness of CLIP Text Encoders

2025-11-08

Авторы:

Allie Tran, Luca Rossetto

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

ID: 2511.04247v1 cs.MM, cs.AI, cs.IR

arXiv PDF

📄 Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation

2025-11-06

Авторы:

Bingyan Xie, Yongpeng Wu, Yuxuan Shi, Biqian Feng, Wenjun Zhang, Jihong Park, Tony Quek

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework with decoupled diffusion multi-frame compensation (DDMFC), abbreviated as WVSC-D, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC-D first encodes original video frames as semantic frames and then conducts video coding base...

ID: 2511.02478v1 cs.MM, cs.AI

arXiv PDF

📄 EVER: Edge-Assisted Auto-Verification for Mobile MR-Aided Operation

2025-10-23

Авторы:

Jiangong Chen, Mingyu Zhu, Bin Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Mixed Reality (MR)-aided operation overlays digital objects on the physical world to provide a more immersive and intuitive operation process. A primary challenge is the precise and fast auto-verification of whether the user follows MR guidance by comparing frames before and after each operation. The pre-operation frame includes virtual guiding objects, while the post-operation frame contains physical counterparts. Existing approaches fall short of accounting for the discrepancies between physic...

ID: 2510.18224v1 cs.MM, cs.AI

arXiv PDF

Показано 1 - 10 из 28 записей