📊 Статистика дайджестов

Всего дайджестов: 35039 Добавлено сегодня: 432

Последнее обновление: сегодня

📄 Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System

2025-11-18

Авторы:

Aditi Bhalla, Christian Hellert, Enkelejda Kasneci

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Driver distraction remains a leading cause of road traffic accidents, contributing to thousands of fatalities annually across the globe. While deep learning-based driver activity recognition methods have shown promise in detecting such distractions, their effectiveness in real-world deployments is hindered by two critical challenges: variations in camera viewpoints (cross-view) and domain shifts such as change in sensor modality or environment. Existing methods typically address either cross-vie...

ID: 2511.12196v1 cs.CV, cs.HC

arXiv PDF

📄 DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition

2025-11-17

Авторы:

Ren Zhang, Huilai Li, Chao qi, Guoliang Xu, Tianyu Zhou, Wei wei, Jianqin Yin

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Micro expression recognition (MER) is crucial for inferring genuine emotion. Applying a multimodal large language model (MLLM) to this task enables spatio-temporal analysis of facial motion and provides interpretable descriptions. However, there are still two core challenges: (1) The entanglement of static appearance and dynamic motion cues prevents the model from focusing on subtle motion; (2) Textual labels in existing MER datasets do not fully correspond to underlying facial muscle movements,...

ID: 2511.10948v1 cs.CV, cs.HC

arXiv PDF

📄 Exploring the "Great Unseen" in Medieval Manuscripts: Instance-Level Labeling of Legacy Image Collections with Zero-Shot Models

2025-11-15

Авторы:

Christofer Meinecke, Estelle Guéville, David Joseph Wrisley

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We aim to theorize the medieval manuscript page and its contents more holistically, using state-of-the-art techniques to segment and describe the entire manuscript folio, for the purpose of creating richer training data for computer vision techniques, namely instance segmentation, and multimodal models for medieval-specific visual content.

ID: 2511.07004v1 cs.CV, cs.HC

arXiv PDF

📄 A Picture is Worth a Thousand (Correct) Captions: A Vision-Guided Judge-Corrector System for Multimodal Machine Translation

2025-11-15

Авторы:

Siddharth Betala, Kushan Raj, Vipul Betala, Rohan Saswade

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we describe our system under the team name BLEU Monday for the English-to-Indic Multimodal Translation Task at WAT 2025. We participate in the text-only translation tasks for English-Hindi, English-Bengali, English-Malayalam, and English-Odia language pairs. We present a two-stage approach that addresses quality issues in the training data through automated error detection and correction, followed by parameter-efficient model fine-tuning. Our methodology introduces a vision-augm...

ID: 2511.07010v1 cs.CL, cs.CV, cs.HC

arXiv PDF

📄 History-Aware Reasoning for GUI Agents

2025-11-15

Авторы:

Ziwei Wang, Leyang Yang, Xiaoxuan Tang, Sheng Zhou, Dajun Chen, Wei Jiang, Yong Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Advances in Multimodal Large Language Models have significantly enhanced Graphical User Interface (GUI) automation. Equipping GUI agents with reliable episodic reasoning capabilities is essential for bridging the gap between users' concise task descriptions and the complexities of real-world execution. Current methods integrate Reinforcement Learning (RL) with System-2 Chain-of-Thought, yielding notable gains in reasoning enhancement. For long-horizon GUI tasks, historical interactions connect e...

ID: 2511.09127v1 cs.AI, cs.CL, cs.CV, cs.HC

arXiv PDF

📄 GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction

2025-11-08

Авторы:

Qingzhou Lu, Yao Feng, Baiyu Shi, Michael Piseno, Zhenan Bao, C. Karen Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Humanoid robots are expected to operate in human-centered environments where safe and natural physical interaction is essential. However, most recent reinforcement learning (RL) policies emphasize rigid tracking and suppress external forces. Existing impedance-augmented approaches are typically restricted to base or end-effector control and focus on resisting extreme forces rather than enabling compliance. We introduce GentleHumanoid, a framework that integrates impedance control into a whole-bo...

ID: 2511.04679v1 cs.RO, cs.CV, cs.HC

arXiv PDF

📄 Learning with Category-Equivariant Representations for Human Activity Recognition

2025-11-07

Авторы:

Yoshihiro Maruyama

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Human activity recognition is challenging because sensor signals shift with context, motion, and environment; effective models must therefore remain stable as the world around them changes. We introduce a categorical symmetry-aware learning framework that captures how signals vary over time, scale, and sensor hierarchy. We build these factors into the structure of feature representations, yielding models that automatically preserve the relationships between sensors and remain stable under realis...

ID: 2511.00900v1 cs.LG, cs.AI, cs.CV, cs.HC

arXiv PDF

📄 Accelerating Physical Property Reasoning for Augmented Visual Cognition

2025-11-07

Авторы:

Hongbo Lan, Zhenlin An, Haoyu Li, Vaibhav Singh, Longfei Shangguan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper introduces \sysname, a system that accelerates vision-guided physical property reasoning to enable augmented visual cognition. \sysname minimizes the run-time latency of this reasoning pipeline through a combination of both algorithmic and systematic optimizations, including rapid geometric 3D reconstruction, efficient semantic feature fusion, and parallel view encoding. Through these simple yet effective optimizations, \sysname reduces the end-to-end latency of this reasoning pipelin...

ID: 2511.03126v1 cs.CV, cs.HC

arXiv PDF

📄 VitalLens 2.0: High-Fidelity rPPG for Heart Rate Variability Estimation from Face Video

2025-11-04

Авторы:

Philipp V. Rouast

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This report introduces VitalLens 2.0, a new deep learning model for estimating physiological signals from face video. This new model demonstrates a significant leap in accuracy for remote photoplethysmography (rPPG), enabling the robust estimation of not only heart rate (HR) and respiratory rate (RR) but also Heart Rate Variability (HRV) metrics. This advance is achieved through a combination of a new model architecture and a substantial increase in the size and diversity of our training data, n...

ID: 2510.27028v1 cs.CV, cs.HC, 68T45, I.4.9; J.3

arXiv PDF

📄 Using Salient Object Detection to Identify Manipulative Cookie Banners that Circumvent GDPR

2025-11-04

Авторы:

Riley Grossman, Michael Smith, Cristian Borcea, Yi Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The main goal of this paper is to study how often cookie banners that comply with the General Data Protection Regulation (GDPR) contain aesthetic manipulation, a design tactic to draw users' attention to the button that permits personal data sharing. As a byproduct of this goal, we also evaluate how frequently the banners comply with GDPR and the recommendations of national data protection authorities regarding banner designs. We visited 2,579 websites and identified the type of cookie banner im...

ID: 2510.26967v1 cs.CY, cs.AI, cs.CV, cs.HC

arXiv PDF

Показано 11 - 20 из 54 записей