📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 CycliST: A Video Language Model Benchmark for Reasoning on Cyclical State Transitions

2025-12-02

Авторы:

Simon Kohaut, Daniel Ochs, Shun Zhang, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present CycliST, a novel benchmark dataset designed to evaluate Video Language Models (VLM) on their ability for textual reasoning over cyclical state transitions. CycliST captures fundamental aspects of real-world processes by generating synthetic, richly structured video sequences featuring periodic patterns in object motion and visual attributes. CycliST employs a tiered evaluation system that progressively increases difficulty through variations in the number of cyclic objects, scene clut...

ID: 2512.01095v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 Adaptive Parameter Optimization for Robust Remote Photoplethysmography

2025-12-01

Авторы:

Cecilia G. Morales, Fanurs Chi En Teh, Kai Li, Pushpak Agrawal, Artur Dubrawski

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Remote photoplethysmography (rPPG) enables contactless vital sign monitoring using standard RGB cameras. However, existing methods rely on fixed parameters optimized for particular lighting conditions and camera setups, limiting adaptability to diverse deployment environments. This paper introduces the Projection-based Robust Signal Mixing (PRISM) algorithm, a training-free method that jointly optimizes photometric detrending and color mixing through online parameter adaptation based on signal q...

ID: 2511.21903v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 WalkCLIP: Multimodal Learning for Urban Walkability Prediction

2025-12-01

Авторы:

Shilong Xiang, JangHyeon Lee, Min Namgung, Yao-Yi Chiang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Urban walkability is a cornerstone of public health, sustainability, and quality of life. Traditional walkability assessments rely on surveys and field audits, which are costly and difficult to scale. Recent studies have used satellite imagery, street view imagery, or population indicators to estimate walkability, but these single-source approaches capture only one dimension of the walking environment. Satellite data describe the built environment from above, but overlook the pedestrian perspect...

ID: 2511.21947v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 Continual Error Correction on Low-Resource Devices

2025-11-28

Авторы:

Kirill Paramonov, Mete Ozay, Aristeidis Mystakidis, Nikolaos Tsalikidis, Dimitrios Sotos, Anastasios Drosou, Dimitrios Tzovaras, Hyunjun Kim, Kiseok Chang, Sangdok Mo, Namwoong Kim, Woojong Yoo, Jijoong Moon, Umberto Michieli

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The proliferation of AI models in everyday devices has highlighted a critical challenge: prediction errors that degrade user experience. While existing solutions focus on error detection, they rarely provide efficient correction mechanisms, especially for resource-constrained devices. We present a novel system enabling users to correct AI misclassifications through few-shot learning, requiring minimal computational resources and storage. Our approach combines server-side foundation model trainin...

ID: 2511.21652v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 DeeAD: Dynamic Early Exit of Vision-Language Action for Efficient Autonomous Driving

2025-11-27

Авторы:

Haibo HU, Lianming Huang, Nan Guan, Chun Jason Xue

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Action (VLA) models unify perception, reasoning, and trajectory generation for autonomous driving, but suffer from significant inference latency due to deep transformer stacks. We present DeeAD, a training-free, action-guided early-exit framework that accelerates VLA planning by evaluating the physical feasibility of intermediate trajectories. Instead of relying on confidence scores, DeeAD terminates inference when predicted trajectories align with lightweight planning priors (e....

ID: 2511.20720v1 cs.CV, cs.AI, cs.LG, cs.RO

arXiv PDF

📄 Foundry: Distilling 3D Foundation Models for the Edge

2025-11-27

Авторы:

Guillaume Letellier, Siddharth Srivastava, Frédéric Jurie, Gaurav Sharma

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this...

ID: 2511.20721v1 cs.CV, cs.AI, cs.LG, cs.NE

arXiv PDF

📄 Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation

2025-11-27

Авторы:

Andrea Ranieri, Giorgio Palmieri, Silvia Biasotti

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), D...

ID: 2511.20541v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 New York Smells: A Large Multimodal Dataset for Olfaction

2025-11-27

Авторы:

Ege Ozguroglu, Junbang Liang, Ruoshi Liu, Mia Chiquier, Michael DeTienne, Wesley Wei Qian, Alexandra Horowitz, Andrew Owens, Carl Vondrick

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

While olfaction is central to how animals perceive the world, this rich chemical sensory modality remains largely inaccessible to machines. One key bottleneck is the lack of diverse, multimodal olfactory training data collected in natural settings. We present New York Smells, a large dataset of paired image and olfactory signals captured ``in the wild.'' Our dataset contains 7,000 smell-image pairs from 3,500 distinct objects across indoor and outdoor environments, with approximately 70$\times$ ...

ID: 2511.20544v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

2025-11-27

Авторы:

Chieh-Yun Chen, Zhonghao Wang, Qi Chen, Zhifan Ye, Min Shi, Yue Zhao, Yinan Zhao, Hui Qu, Wei-An Lin, Yiru Shen, Ajinkya Kale, Irfan Essa, Humphrey Shi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax, improving one dimension while degrading others. To address this, we introduce two complementary methods: MapReduce LoRA and Reward-aware Token Embedding (RaTE). MapReduce LoRA trains preference-specific LoRA experts in parallel and iteratively merges them to refin...

ID: 2511.20629v1 cs.CV, cs.AI, cs.LG

arXiv PDF

📄 SPHINX: A Synthetic Environment for Visual Perception and Reasoning

2025-11-27

Авторы:

Md Tanvirul Alam, Saksham Aggarwal, Justin Yang Chae, Nidhi Rastogi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present Sphinx, a synthetic environment for visual perception and reasoning that targets core cognitive primitives. Sphinx procedurally generates puzzles using motifs, tiles, charts, icons, and geometric primitives, each paired with verifiable ground-truth solutions, enabling both precise evaluation and large-scale dataset construction. The benchmark covers 25 task types spanning symmetry detection, geometric transformations, spatial reasoning, chart interpretation, and sequence prediction. E...

ID: 2511.20814v1 cs.CV, cs.AI, cs.LG

arXiv PDF

Показано 21 - 30 из 358 записей