📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation

2025-11-07

Авторы:

Xiangyu Shi, Zerui Li, Yanyuan Qiao, Qi Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent advances in Vision-and-Language Navigation in Continuous Environments (VLN-CE) have leveraged multimodal large language models (MLLMs) to achieve zero-shot navigation. However, existing methods often rely on panoramic observations and two-stage pipelines involving waypoint predictors, which introduce significant latency and limit real-world applicability. In this work, we propose Fast-SmartWay, an end-to-end zero-shot VLN-CE framework that eliminates the need for panoramic views and waypo...

ID: 2511.00933v1 cs.RO, cs.CV

arXiv PDF

📄 LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping

2025-11-07

Авторы:

Lijie Wang, Lianjie Guo, Ziyi Xu, Qianhao Wang, Fei Gao, Xieyuanli Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odometry (LIVO), its performance remains highly sensitive to extrinsic calibration. Meanwhile, 3D vision foundation models, such as VGGT, suffer from limited scalability in large environments and inherently lack metric scale. To overcome these limitations, we propose LiDAR-VGGT, a novel framework that tightly coupl...

ID: 2511.01186v1 cs.RO, cs.CV

arXiv PDF

📄 Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects

2025-11-07

Авторы:

Jiawei Wang, Dingyou Wang, Jiaming Hu, Qixuan Zhang, Jingyi Yu, Lan Xu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captured through articulated objects, which are essential for tasks such as physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or ...

ID: 2511.01294v2 cs.RO, cs.CV

arXiv PDF

📄 Comprehensive Assessment of LiDAR Evaluation Metrics: A Comparative Study Using Simulated and Real Data

2025-11-07

Авторы:

Syed Mostaquim Ali, Taufiq Rahman, Ghazal Farhani, Mohamed H. Zaki, Benoit Anctil, Dominique Charlebois

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

For developing safe Autonomous Driving Systems (ADS), rigorous testing is required before they are deemed safe for road deployments. Since comprehensive conventional physical testing is impractical due to cost and safety concerns, Virtual Testing Environments (VTE) can be adopted as an alternative. Comparing VTE-generated sensor outputs against their real-world analogues can be a strong indication that the VTE accurately represents reality. Correspondingly, this work explores a comprehensive exp...

ID: 2511.02994v1 cs.RO, cs.CV

arXiv PDF

📄 OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera

2025-11-07

Авторы:

Hao Shi, Ze Wang, Shangwei Guo, Mengfei Duan, Song Wang, Teng Chen, Kailun Yang, Lin Wang, Kaiwei Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Robust 3D semantic occupancy is crucial for legged/humanoid robots, yet most semantic scene completion (SSC) systems target wheeled platforms with forward-facing sensors. We present OneOcc, a vision-only panoramic SSC framework designed for gait-introduced body jitter and 360{\deg} continuity. OneOcc combines: (i) Dual-Projection fusion (DP-ER) to exploit the annular panorama and its equirectangular unfolding, preserving 360{\deg} continuity and grid alignment; (ii) Bi-Grid Voxelization (BGV) to...

ID: 2511.03571v1 cs.RO, cs.CV, eess.IV

arXiv PDF

📄 Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural

2025-11-07

Авторы:

Andrei A. Korigodskii, Oleg D. Kalachev, Artem E. Vasiunik, Matvei V. Urvantsev, Georgii E. Bondar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper presents the innovative design and successful deployment of a pioneering autonomous unmanned aerial system developed for executing the world's largest mural painted by a drone. Addressing the dual challenges of maintaining artistic precision and operational reliability under adverse outdoor conditions such as wind and direct sunlight, our work introduces a robust system capable of navigating and painting outdoors with unprecedented accuracy. Key to our approach is a novel navigation s...

ID: 2511.03651v1 cs.RO, cs.CV, cs.SY, eess.SY, I.2.9; J.5

arXiv PDF

📄 MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence

2025-11-06

Авторы:

Renjun Gao, Peiyan Zhong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Multimodal large language models (MLLMs) have shown remarkable capabilities in cross-modal understanding and reasoning, offering new opportunities for intelligent assistive systems, yet existing systems still struggle with risk-aware planning, user personalization, and grounding language plans into executable skills in cluttered homes. We introduce MARS - a Multi-Agent Robotic System powered by MLLMs for assistive intelligence and designed for smart home robots supporting people with disabilitie...

ID: 2511.01594v1 cs.RO, cs.CV, I.2.9; I.2.11; I.2.6; I.4.8

arXiv PDF

📄 Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

2025-11-06

Авторы:

Jiayi Chen, Wenxuan Song, Pengxiang Ding, Ziyang Zhou, Han Zhao, Feilong Tang, Donglin Wang, Haoang Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-language-action (VLA) models aim to understand natural language instructions and visual observations and to execute corresponding actions as an embodied agent. Recent work integrates future images into the understanding-acting loop, yielding unified VLAs that jointly understand, generate, and act -- reading text and images and producing future images and actions. However, these models either rely on external experts for modality unification or treat image generation and action prediction ...

ID: 2511.01718v1 cs.RO, cs.CV

arXiv PDF

📄 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System

2025-11-06

Авторы:

Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, C. Karen Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expensive motion capture setups. We introduce TWIST2, a portable, mocap-free humanoid teleoperation and data collection system that preserves full whole-body control while advancing scalability. Our syste...

ID: 2511.02832v1 cs.RO, cs.CV, cs.LG

arXiv PDF

📄 Self-localization on a 3D map by fusing global and local features from a monocular camera

2025-11-01

Авторы:

Satoshi Kikuch, Masaya Kato, Tsuyoshi Tasaki

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-localization on a 3D map by using an inexpensive monocular camera is required to realize autonomous driving. Self-localization based on a camera often uses a convolutional neural network (CNN) that can extract local features that are calculated by nearby pixels. However, when dynamic obstacles, such as people, are present, CNN does not work well. This study proposes a new method combining CNN with Vision Transformer, which excels at extracting global features that show the relationship of p...

ID: 2510.26170v1 cs.RO, cs.CV

arXiv PDF

Показано 61 - 70 из 225 записей