📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

2025-12-04

Авторы:

Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castañeda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration stra...

ID: 2512.01061v1 cs.RO, cs.CV

arXiv PDF

📄 TRoVe: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models

2025-12-04

Авторы:

Maya Varma, Jean-Benoit Delbrouck, Sophie Ostmeier, Akshay Chaudhari, Curtis Langlotz

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-language models (VLMs) have made great strides in addressing temporal understanding tasks, which involve characterizing visual changes across a sequence of images. However, recent works have suggested that when making predictions, VLMs may rely on static feature biases, such as background or object features, rather than dynamic visual changes. Static feature biases are a type of shortcut and can contribute to systematic prediction errors on downstream tasks; as a result, identifying and c...

ID: 2512.01048v1 cs.CV

arXiv PDF

📄 Generalized Medical Phrase Grounding

2025-12-04

Авторы:

Wenjun Zhang, Shekhar S. Chandra, Aaron Nicolson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Medical phrase grounding (MPG) maps textual descriptions of radiological findings to corresponding image regions. These grounded reports are easier to interpret, especially for non-experts. Existing MPG systems mostly follow the referring expression comprehension (REC) paradigm and return exactly one bounding box per phrase. Real reports often violate this assumption. They contain multi-region findings, non-diagnostic text, and non-groundable phrases, such as negations or descriptions of normal ...

ID: 2512.01085v1 cs.CV, cs.CL

arXiv PDF

📄 Learning Eigenstructures of Unstructured Data Manifolds

2025-12-04

Авторы:

Roy Velich, Arkadi Piven, David Bensaïd, Daniel Cremers, Thomas Dagès, Ron Kimmel

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce a novel framework that directly learns a spectral basis for shape and manifold analysis from unstructured data, eliminating the need for traditional operator selection, discretization, and eigensolvers. Grounded in optimal-approximation theory, we train a network to decompose an implicit approximation operator by minimizing the reconstruction error in the learned basis over a chosen distribution of probe functions. For suitable distributions, they can be seen as an approximation of ...

ID: 2512.01103v1 cs.CV

arXiv PDF

📄 Accelerating Inference of Masked Image Generators via Reinforcement Learning

2025-12-04

Авторы:

Pranav Subbaraman, Shufan Li, Siyan Zhao, Aditya Grover

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Masked Generative Models (MGM)s demonstrate strong capabilities in generating high-fidelity images. However, they need many sampling steps to create high-quality generations, resulting in slow inference speed. In this work, we propose Speed-RL, a novel paradigm for accelerating a pretrained MGMs to generate high-quality images in fewer steps. Unlike conventional distillation methods which formulate the acceleration problem as a distribution matching problem, where a few-step student model is tra...

ID: 2512.01094v1 cs.CV

arXiv PDF

📄 Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis

2025-12-04

Авторы:

Yilan Zhang, Li Nanbo, Changchun Yang, Jürgen Schmidhuber, Xin Gao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-leve...

ID: 2512.01116v1 cs.CV

arXiv PDF

📄 Estimation of Kinematic Motion from Dashcam Footage

2025-12-04

Авторы:

Evelyn Zhang, Alex Richardson, Jonathan Sprinkle

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The goal of this paper is to explore the accuracy of dashcam footage to predict the actual kinematic motion of a car-like vehicle. Our approach uses ground truth information from the vehicle's on-board data stream, through the controller area network, and a time-synchronized dashboard camera, mounted to a consumer-grade vehicle, for 18 hours of footage and driving. The contributions of the paper include neural network models that allow us to quantify the accuracy of predicting the vehicle speed ...

ID: 2512.01104v1 cs.RO, cs.CV

arXiv PDF

📄 Weakly Supervised Continuous Micro-Expression Intensity Estimation Using Temporal Deep Neural Network

2025-12-04

Авторы:

Riyadh Mohammed Almushrafy

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Micro-facial expressions are brief and involuntary facial movements that reflect genuine emotional states. While most prior work focuses on classifying discrete micro-expression categories, far fewer studies address the continuous evolution of intensity over time. Progress in this direction is limited by the lack of frame-level intensity labels, which makes fully supervised regression impractical. We propose a unified framework for continuous micro-expression intensity estimation using only we...

ID: 2512.01145v1 cs.CV

arXiv PDF

📄 OmniFD: A Unified Model for Versatile Face Forgery Detection

2025-12-04

Авторы:

Haotian Liu, Haoyu Chen, Chenhui Pan, You Hu, Guoying Zhao, Xiaobai Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Face forgery detection encompasses multiple critical tasks, including identifying forged images and videos and localizing manipulated regions and temporal segments. Current approaches typically employ task-specific models with independent architectures, leading to computational redundancy and ignoring potential correlations across related tasks. We introduce OmniFD, a unified framework that jointly addresses four core face forgery detection tasks within a single model, i.e., image and video clas...

ID: 2512.01128v1 cs.CV

arXiv PDF

📄 TabletopGen: Instance-Level Interactive 3D Tabletop Scene Generation from Text or Single Image

2025-12-04

Авторы:

Ziqian Wang, Yonghao He, Licheng Yang, Wei Zou, Hongxuan Ma, Liu Liu, Wei Sui, Yuxin Guo, Hu Su

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generating high-fidelity, physically interactive 3D simulated tabletop scenes is essential for embodied AI--especially for robotic manipulation policy learning and data synthesis. However, current text- or image-driven 3D scene generation methods mainly focus on large-scale scenes, struggling to capture the high-density layouts and complex spatial relations that characterize tabletop scenes. To address these challenges, we propose TabletopGen, a training-free, fully automatic framework that gene...

ID: 2512.01204v2 cs.CV

arXiv PDF

1
2
20
21
22
23
24
1161
1162

Показано 211 - 220 из 11614 записей