📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

2025-10-22

Авторы:

Zhengshen Zhang, Hao Li, Yalun Dai, Zhengbang Zhu, Lei Zhou, Chenchen Liu, Dong Wang, Francis E. H. Tay, Sijin Chen, Ziwei Liu, Yuxiao Liu, Xinghang Li, Pan Zhou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. Recent 3D integration techniques for VLAs either require specialized sensors and transfer poorly across modalities, or inject weak cues that lack geometry and degrade vision-language alignment. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the a...

ID: 2510.17439v1 cs.RO, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Learning Human-Humanoid Coordination for Collaborative Object Carrying

2025-10-18

Авторы:

Yushi Du, Yixuan Li, Baoxiong Jia, Yutang Lin, Pei Zhou, Wei Liang, Yanchao Yang, Siyuan Huang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Human-humanoid collaboration shows significant promise for applications in healthcare, domestic assistance, and manufacturing. While compliant robot-human collaboration has been extensively developed for robotic arms, enabling compliant human-humanoid collaboration remains largely unexplored due to humanoids' complex whole-body dynamics. In this paper, we propose a proprioception-only reinforcement learning approach, COLA, that combines leader and follower behaviors within a single policy. The m...

ID: 2510.14293v1 cs.RO, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 A Density-Informed Multimodal Artificial Intelligence Framework for Improving Breast Cancer Detection Across All Breast Densities

2025-10-18

Авторы:

Siva Teja Kakileti, Bharath Govindaraju, Sudhakar Sampangi, Geetha Manjunath

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Mammography, the current standard for breast cancer screening, has reduced sensitivity in women with dense breast tissue, contributing to missed or delayed diagnoses. Thermalytix, an AI-based thermal imaging modality, captures functional vascular and metabolic cues that may complement mammographic structural data. This study investigates whether a breast density-informed multi-modal AI framework can improve cancer detection by dynamically selecting the appropriate imaging modality based on breas...

ID: 2510.14340v1 eess.IV, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

2025-10-18

Авторы:

Mingxuan Yan, Yuping Wang, Zechun Liu, Jiachen Li

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can devia...

ID: 2510.14968v1 cs.RO, cs.AI, cs.CV, cs.LG, cs.SY, eess.SY

arXiv PDF

📄 Dedelayed: Deleting remote inference delay via on-device correction

2025-10-17

Авторы:

Dan Jacobellis, Mateen Ulhaq, Fabien Racapé, Hyomin Choi, Neeraja J. Yadwadkar

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Remote inference allows lightweight devices to leverage powerful cloud models. However, communication network latency makes predictions stale and unsuitable for real-time tasks. To address this, we introduce Dedelayed, a delay-corrective method that mitigates arbitrary remote inference delays, allowing the local device to produce low-latency outputs in real time. Our method employs a lightweight local model that processes the current frame and fuses in features that a heavyweight remote model co...

ID: 2510.13714v1 eess.IV, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception -- Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track

2025-10-11

Авторы:

Erjia Xiao, Lingfeng Zhang, Yingbo Tang, Hao Cheng, Renjing Xu, Wenbo Ding, Lei Zhou, Long Chen, Hangjun Ye, Xiaoshuai Hao

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this report, we describe the technical details of our submission to the IROS 2025 RoboSense Challenge Social Navigation Track. This track focuses on developing RGBD-based perception and navigation systems that enable autonomous agents to navigate safely, efficiently, and socially compliantly in dynamic human-populated indoor environments. The challenge requires agents to operate from an egocentric perspective using only onboard sensors including RGB-D observations and odometry, without access...

ID: 2510.07871v1 cs.RO, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 AI-Driven Radiology Report Generation for Traumatic Brain Injuries

2025-10-11

Авторы:

Riadh Bouslimi, Houda Trabelsi, Wahiba Ben Abdssalem Karaa, Hana Hedhli

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Traumatic brain injuries present significant diagnostic challenges in emergency medicine, where the timely interpretation of medical images is crucial for patient outcomes. In this paper, we propose a novel AI-based approach for automatic radiology report generation tailored to cranial trauma cases. Our model integrates an AC-BiFPN with a Transformer architecture to capture and process complex medical imaging data such as CT and MRI scans. The AC-BiFPN extracts multi-scale features, enabling the...

ID: 2510.08498v1 eess.IV, cs.AI, cs.CV, cs.LG, 68T07, 68U10, I.2.10; I.2.7; I.4.5

arXiv PDF

📄 How to Teach Large Multimodal Models New Skills

2025-10-11

Авторы:

Zhen Zhu, Yiming Gong, Yao Xiao, Yaoyao Liu, Derek Hoiem

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequential fine-tuning on five target skills while monitoring general ability on eight held-out benchmarks across three model families. We observe that apparent "forgetting" on held-out tasks after narrow fine-tuning can partly recover at later stages. We trace this behavior to a measurable shift in the output token distribution, manifested through a simple counting-bias probe that co-varies with...

ID: 2510.08564v1 cs.AI, cs.CV, cs.LG

arXiv PDF

📄 Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications

2025-10-10

Авторы:

Kento Kawaharazuka, Jihoon Oh, Jun Yamada, Ingmar Posner, Yuke Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments. This generalisation capability is expected to enable robots to solve novel downst...

ID: 2510.07077v1 cs.RO, cs.AI, cs.CV, cs.LG

arXiv PDF

📄 EmbodiSwap for Zero-Shot Robot Imitation Learning

2025-10-08

Авторы:

Eadom Dessalene, Pavan Mantripragada, Michael Maynord, Yiannis Aloimonos

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce EmbodiSwap - a method for producing photorealistic synthetic robot overlays over human video. We employ EmbodiSwap for zero-shot imitation learning, bridging the embodiment gap between in-the-wild ego-centric human video and a target robot embodiment. We train a closed-loop robot manipulation policy over the data produced by EmbodiSwap. We make novel use of V-JEPA as a visual backbone, repurposing V-JEPA from the domain of video understanding to imitation learning over synthetic rob...

ID: 2510.03706v1 cs.RO, cs.AI, cs.CV, cs.LG

arXiv PDF

Показано 41 - 50 из 124 записей