📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models

2025-10-04

Авторы:

Ulas Berk Karli, Ziyao Shangguan, Tesca FItzgerald

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent Vision-Language-Action (VLA) models show strong generalization capabilities, yet they lack introspective mechanisms for anticipating failures and requesting help from a human supervisor. We present \textbf{INSIGHT}, a learning framework for leveraging token-level uncertainty signals to predict when a VLA should request help. Using $\pi_0$-FAST as the underlying model, we extract per-token \emph{entropy}, \emph{log-probability}, and Dirichlet-based estimates of \emph{aleatoric and epistemi...

ID: 2510.01389v1 cs.RO, cs.AI, cs.LG

arXiv PDF

📄 AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation

2025-10-04

Авторы:

Anukriti Singh, Kasra Torshizi, Khuzema Habib, Kelin Yu, Ruohan Gao, Pratap Tokekar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-based robot learning often relies on dense image or point-cloud inputs, which are computationally heavy and entangle irrelevant background features. Existing keypoint-based approaches can focus on manipulation-centric features and be lightweight, but either depend on manual heuristics or task-coupled selection, limiting scalability and semantic understanding. To address this, we propose AFFORD2ACT, an affordance-guided framework that distills a minimal set of semantic 2D keypoints from a ...

ID: 2510.01433v1 cs.RO, cs.AI

arXiv PDF

📄 VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs

2025-10-04

Авторы:

Mohamad Al Mdfaa, Svetlana Lukina, Timur Akhtyamov, Arthur Nigmatzyanov, Dmitrii Nalberskii, Sergey Zagoruyko, Gonzalo Ferrer

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-language models (VLMs) have shown potential for robot navigation but encounter fundamental limitations: they lack persistent scene memory, offer limited spatial reasoning, and do not scale effectively with video duration for real-time application. We present VL-KnG, a Visual Scene Understanding system that tackles these challenges using spatiotemporal knowledge graph construction and computationally efficient query processing for navigation goal identification. Our approach processes vide...

ID: 2510.01483v1 cs.RO, cs.AI

arXiv PDF

📄 PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization

2025-10-04

Авторы:

Zixing Lei, Zibo Zhou, Sheng Yin, Yueru Chen, Qingyao Xu, Weixin Li, Yunhong Wang, Bowei Tang, Wei Jing, Siheng Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to ...

ID: 2510.01708v1 cs.RO, cs.AI

arXiv PDF

📄 Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

2025-10-04

Авторы:

Haibo Hu, Lianming Huang, Xinyu Wang, Yufei Cui, Nan Guan, Chun Jason Xue

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which ta...

ID: 2510.01795v1 cs.RO, cs.AI

arXiv PDF

📄 TACOS: Task Agnostic COordinator of a multi-drone System

2025-10-04

Авторы:

Alessandro Nazzari, Roberto Rubinacci, Marco Lovera

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

When a single pilot is responsible for managing a multi-drone system, the task demands varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot worklo...

ID: 2510.01869v1 cs.RO, cs.AI, cs.MA

arXiv PDF

📄 Autonomous Multi-Robot Infrastructure for AI-Enabled Healthcare Delivery and Diagnostics

2025-10-03

Авторы:

Nakhul Kalaivanan, Senthil Arumugam Muthukumaraswamy, Girish Balasubramanian

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This research presents a multi-robot system for inpatient care, designed using swarm intelligence principles and incorporating wearable health sensors, RF-based communication, and AI-driven decision support. Within a simulated hospital environment, the system adopts a leader-follower swarm configuration to perform patient monitoring, medicine delivery, and emergency assistance. Due to ethical constraints, live patient trials were not conducted; instead, validation was carried out through control...

ID: 2509.26106v1 cs.RO, cs.AI, 68T40, 68T05, I.2.9; I.2.6

arXiv PDF

📄 LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search

2025-10-02

Авторы:

Ruiyang Wang, Haolun Tsu, David Hunt, Shaocheng Luo, Jiwoo Kim, Miroslav Pajic

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Autonomous exploration and object search in unknown indoor environments remain challenging for multi-robot systems (MRS). Traditional approaches often rely on greedy frontier assignment strategies with limited inter-robot coordination. In this work, we introduce LLM-MCoX (LLM-based Multi-robot Coordinated Exploration and Search), a novel framework that leverages Large Language Models (LLMs) for intelligent coordination of both homogeneous and heterogeneous robot teams tasked with efficient explo...

ID: 2509.26324v1 cs.RO, cs.AI, cs.MA

arXiv PDF

📄 SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning

2025-10-02

Авторы:

Zichao Shen, Chen Gao, Jiaqi Yuan, Tianchen Zhu, Xingcheng Fu, Qingyun Sun

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Embodied task planning requires agents to produce executable actions in a close-loop manner within the environment. With progressively improving capabilities of LLMs in task decomposition, planning, and generalization, current embodied task planning methods adopt LLM-based architecture.However, existing LLM-based planners remain limited in three aspects, i.e., fixed planning paradigms, lack of action sequence constraints, and error-agnostic. In this work, we propose SDA-PLANNER, enabling an adap...

ID: 2509.26375v1 cs.RO, cs.AI, cs.CV

arXiv PDF

📄 OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

2025-10-02

Авторы:

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locom...

ID: 2509.26633v1 cs.RO, cs.AI, cs.LG, cs.SY, eess.SY

arXiv PDF

Показано 271 - 280 из 544 записей