📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization

2025-12-05

Авторы:

Yicheng Liu, Shiduo Zhang, Zibin Dong, Baijun Ye, Tianyuan Yuan, Xiaopeng Yu, Linqi Yin, Chenhao Lu, Junhao Shi, Luca Jiang-Tao Yu, Liangtao Zheng, Tao Jiang, Jingjing Gong, Xipeng Qiu, Hang Zhao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a trade-off between reconstruction fidelity and inference efficiency. We introduce FASTer, a unified framework for efficient and generalizable robot learning that integrates a learnable tokenizer with an autoregressive policy built upon it. FASTerVQ encodes action chunks as single-channel images, capturing glo...

ID: 2512.04952v1 cs.CV, cs.RO

arXiv PDF

📄 Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints

2025-12-05

Авторы:

Minghan Zhu, Zhiyi Wang, Qihang Sun, Maani Ghaffari, Michael Posa

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be ob...

ID: 2512.05079v1 cs.CV, cs.RO

arXiv PDF

📄 Image Generation as a Visual Planner for Robotic Manipulation

2025-12-04

Авторы:

Ye Pang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generating realistic robotic manipulation videos is an important step toward unifying perception, planning, and action in embodied agents. While existing video diffusion models require large domain-specific datasets and struggle to generalize, recent image generation models trained on language-image corpora exhibit strong compositionality, including the ability to synthesize temporally coherent grid images. This suggests a latent capacity for video-like generation even without explicit temporal ...

ID: 2512.00532v1 cs.CV, cs.RO

arXiv PDF

📄 TrajDiff: End-to-end Autonomous Driving without Perception Annotation

2025-12-04

Авторы:

Xingtai Gui, Jianbo Zhao, Wencheng Han, Jikai Wang, Jiahao Gong, Feiyang Tan, Cheng-zhong Xu, Jianbing Shen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

End-to-end autonomous driving systems directly generate driving policies from raw sensor inputs. While these systems can extract effective environmental features for planning, relying on auxiliary perception tasks, developing perception annotation-free planning paradigms has become increasingly critical due to the high cost of manual perception annotation. In this work, we propose TrajDiff, a Trajectory-oriented BEV Conditioned Diffusion framework that establishes a fully perception annotation-f...

ID: 2512.00723v1 cs.CV, cs.RO

arXiv PDF

📄 SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

2025-12-04

Авторы:

Chaojun Ni, Cheng Chen, Xiaofeng Wang, Zheng Zhu, Wenzhao Zheng, Boyuan Wang, Tianrun Chen, Guosheng Zhao, Haoyun Li, Zhehao Dong, Qiang Zhang, Yun Ye, Yang Wang, Guan Huang, Wenjun Mei

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language-Action (VLA) models built on pretrained Vision-Language Models (VLMs) show strong potential but are limited in practicality due to their large parameter counts. To mitigate this issue, using a lightweight VLM has been explored, but it compromises spatiotemporal reasoning. Although some methods suggest that incorporating additional 3D inputs can help, they usually rely on large VLMs to fuse 3D and 2D inputs and still lack temporal understanding. Therefore, we propose SwiftVLA, an ...

ID: 2512.00903v1 cs.CV, cs.RO

arXiv PDF

📄 SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge

2025-12-04

Авторы:

Yumeng He, Ying Jiang, Jiayin Lu, Yin Yang, Chenfanfu Jiang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Articulated 3D objects are critical for embodied AI, robotics, and interactive scene understanding, yet creating simulation-ready assets remains labor-intensive and requires expert modeling of part hierarchies and motion structures. We introduce SPARK, a framework for reconstructing physically consistent, kinematic part-level articulated objects from a single RGB image. Given an input image, we first leverage VLMs to extract coarse URDF parameters and generate part-level reference images. We the...

ID: 2512.01629v2 cs.CV, cs.RO

arXiv PDF

📄 Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching

2025-12-04

Авторы:

Yue Pan, Tao Sun, Liyuan Zhu, Lucas Nunes, Iro Armeni, Jens Behley, Cyrill Stachniss

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Point cloud registration aligns multiple unposed point clouds into a common frame, and is a core step for 3D reconstruction and robot localization. In this work, we cast registration as conditional generation: a learned continuous, point-wise velocity field transports noisy points to a registered scene, from which the pose of each view is recovered. Unlike previous methods that conduct correspondence matching to estimate the transformation between a pair of point clouds and then optimize the pai...

ID: 2512.01850v1 cs.CV, cs.RO

arXiv PDF

📄 Data-Centric Visual Development for Self-Driving Labs

2025-12-04

Авторы:

Anbang Liu, Guanzhong Hu, Jiayi Wang, Ping Guo, Han Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome...

ID: 2512.02018v1 cs.CV, cs.RO

arXiv PDF

📄 nuScenes Revisited: Progress and Challenges in Autonomous Driving

2025-12-04

Авторы:

Whye Kit Fong, Venice Erin Liong, Kok Seang Tan, Holger Caesar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) have been revolutionized by Deep Learning. As a data-driven approach, Deep Learning relies on vast amounts of driving data, typically labeled in great detail. As a result, datasets, alongside hardware and algorithms, are foundational building blocks for the development of AVs. In this work we revisit one of the most widely used autonomous driving datasets: the nuScenes dataset. nuScenes exemplifies key trends in AV developmen...

ID: 2512.02448v1 cs.CV, cs.RO

arXiv PDF

📄 Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models

2025-12-04

Авторы:

Pierpaolo Serio, Giulio Pisaneschi, Andrea Dan Ryals, Vincenzo Infantino, Lorenzo Gentilini, Valentina Donzella, Lorenzo Pollini

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This work presents a systematic investigation into how alternative LiDAR-to-image projections affect metric place recognition when coupled with a state-of-the-art vision foundation model. We introduce a modular retrieval pipeline that controls for backbone, aggregation, and evaluation protocol, thereby isolating the influence of the 2-D projection itself. Using consistent geometric and structural channels across multiple datasets and deployment scenarios, we identify the projection characteristi...

ID: 2512.02897v1 cs.CV, cs.RO

arXiv PDF

Показано 1 - 10 из 246 записей