📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Dual-Stream Spectral Decoupling Distillation for Remote Sensing Object Detection

2025-12-05

Авторы:

Xiangyi Gao, Danpei Zhao, Bo Yuan, Wentao Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Knowledge distillation is an effective and hardware-friendly method, which plays a key role in lightweighting remote sensing object detection. However, existing distillation methods often encounter the issue of mixed features in remote sensing images (RSIs), and neglect the discrepancies caused by subtle feature variations, leading to entangled knowledge confusion. To address these challenges, we propose an architecture-agnostic distillation method named Dual-Stream Spectral Decoupling Distillat...

ID: 2512.04413v1 cs.CV, cs.AI

arXiv PDF

📄 Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models

2025-12-05

Авторы:

Manar Alnaasan, Md Selim Sarowar, Sungho Kim

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Accurate and interpretable gait analysis plays a crucial role in the early detection of Parkinsons disease (PD),yet most existing approaches remain limited by single-modality inputs, low robustness, and a lack of clinical transparency. This paper presents an explainable multimodal framework that integrates RGB and Depth (RGB-D) data to recognize Parkinsonian gait patterns under realistic conditions. The proposed system employs dual YOLOv11-based encoders for modality-specific feature extraction,...

ID: 2512.04425v1 cs.CV, cs.AI

arXiv PDF

📄 GuidNoise: Single-Pair Guided Diffusion for Generalized Noise Synthesis

2025-12-05

Авторы:

Changjin Kim, HyeokJun Lee, YoungJoon Yoo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent image denoising methods have leveraged generative modeling for real noise synthesis to address the costly acquisition of real-world noisy data. However, these generative models typically require camera metadata and extensive target-specific noisy-clean image pairs, often showing limited generalization between settings. In this paper, to mitigate the prerequisites, we propose a Single-Pair Guided Diffusion for generalized noise synthesis GuidNoise, which uses a single noisy/clean pair as t...

ID: 2512.04456v1 cs.CV, cs.AI

arXiv PDF

📄 PhyVLLM: Physics-Guided Video Language Model with Motion-Appearance Disentanglement

2025-12-05

Авторы:

Yu-Wei Zhan, Xin Wang, Hong Chen, Tongtong Feng, Wei Feng, Ren Wang, Guangyao Li, Qing Li, Wenwu Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Video Large Language Models (Video LLMs) have shown impressive performance across a wide range of video-language tasks. However, they often fail in scenarios requiring a deeper understanding of physical dynamics. This limitation primarily arises from their reliance on appearance-based matching. Incorporating physical motion modeling is crucial for deeper video understanding, but presents three key challenges: (1) motion signals are often entangled with appearance variations, making it difficult ...

ID: 2512.04532v1 cs.CV, cs.AI

arXiv PDF

📄 Detection of Intoxicated Individuals from Facial Video Sequences via a Recurrent Fusion Model

2025-12-05

Авторы:

Bita Baroutian, Atefe Aghaei, Mohsen Ebrahimi Moghaddam

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Alcohol consumption is a significant public health concern and a major cause of accidents and fatalities worldwide. This study introduces a novel video-based facial sequence analysis approach dedicated to the detection of alcohol intoxication. The method integrates facial landmark analysis via a Graph Attention Network (GAT) with spatiotemporal visual features extracted using a 3D ResNet. These features are dynamically fused with adaptive prioritization to enhance classification performance. Add...

ID: 2512.04536v1 cs.CV, cs.AI

arXiv PDF

📄 When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering

2025-12-05

Авторы:

Tao Wu, Chuhao Zhou, Guangyu Zhao, Haozhi Cao, Yewen Pu, Jianfei Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that every question must be answered, but embodied agents should know when they do not have sufficient information to answer. In this work, we focus on a minimal requirement for EQA agents, abstention: knowing when to withhold an answer. From an initial study of 500 human queries, we find that 32.4% contain missing...

ID: 2512.04597v1 cs.CV, cs.AI, cs.LG, cs.RO

arXiv PDF

📄 SEASON: Mitigating Temporal Hallucination in Video Large Language Models via Self-Diagnostic Contrastive Decoding

2025-12-05

Авторы:

Chang-Hsun Wu, Kai-Po Chang, Yu-Yang Sheng, Hung-Kai Chung, Kuei-Chun Wang, Yu-Chiang Frank Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Video Large Language Models (VideoLLMs) have shown remarkable progress in video understanding. However, these models still struggle to effectively perceive and exploit rich temporal information in videos when responding to user queries. Therefore, they often generate descriptions of events that are temporal inconsistent or causally implausible, causing severe hallucination issues. While most prior studies have focused on spatial hallucinations (e.g. object mismatches), temporal reasoning in vide...

ID: 2512.04643v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Measuring the Unspoken: A Disentanglement Model and Benchmark for Psychological Analysis in the Wild

2025-12-05

Авторы:

Yigui Feng, Qinglin Wang, Haotian Mo, Yang Liu, Ke Liu, Gencheng Liu, Xinhai Chen, Siqi Shen, Songzhu Mei, Jie Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generative psychological analysis of in-the-wild conversations faces two fundamental challenges: (1) existing Vision-Language Models (VLMs) fail to resolve Articulatory-Affective Ambiguity, where visual patterns of speech mimic emotional expressions; and (2) progress is stifled by a lack of verifiable evaluation metrics capable of assessing visual grounding and reasoning depth. We propose a complete ecosystem to address these twin challenges. First, we introduce Multilevel Insight Network for Di...

ID: 2512.04728v1 cs.CV, cs.AI

arXiv PDF

📄 E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving

2025-12-05

Авторы:

Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong Xu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and AD acceptance. We introduce Open-Domain End-to-End (OD-E2E) autonomous driving, where an autonomous vehicle (AV) must interpret free-form natural-language commands, infer the emotion, and plan a physically feasible trajectory. We propose E3AD, an emotion-aware VLA framework that augments semantic understandin...

ID: 2512.04733v1 cs.CV, cs.AI

arXiv PDF

📄 ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

2025-12-05

Авторы:

Guanbo Huang, Jingjia Mao, Fanding Huang, Fengkai Liu, Xiangyang Luo, Yaoyuan Liang, Jiasheng Lu, Xiaoe Wang, Pei Liu, Ruiliu Fu, Shao-Lun Huang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure bias in Flow Matching, including: (1) the model lacks generalization to biased inputs during training, and (2) insufficient low-frequency content captured during early denoising, leading to accumulated bias. Based on these insights, we propose ReflexFlow, a simple and effective reflexive refinement of the Flo...

ID: 2512.04904v1 cs.CV, cs.AI

arXiv PDF

Показано 1 - 10 из 2274 записей