📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 0

Последнее обновление: сегодня

📄 Lips-Jaw and Tongue-Jaw Articulatory Tradeoff in DYNARTmo

2025-12-02

Авторы:

Bernd J. Kröger

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper investigates how the dynamic articulatory model DYNARTmo accounts for articulatory tradeoffs between primary and secondary articulators, with a focus on lips-jaw and tongue-jaw coordination. While DYNARTmo does not implement full task-dynamic second-order biomechanics, it adopts first-order task-space gesture specifications comparable to those used in articulatory phonology and integrates a simplified mechanism for distributing articulatory effort across multiple articulators. We firs...

ID: 2511.22155v1 cs.CL, cs.RO

arXiv PDF

📄 Enhancing Underwater Object Detection through Spatio-Temporal Analysis and Spatial Attention Networks

2025-11-01

Авторы:

Sai Likhith Karri, Ansh Saxena

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This study examines the effectiveness of spatio-temporal modeling and the integration of spatial attention mechanisms in deep learning models for underwater object detection. Specifically, in the first phase, the performance of temporal-enhanced YOLOv5 variant T-YOLOv5 is evaluated, in comparison with the standard YOLOv5. For the second phase, an augmented version of T-YOLOv5 is developed, through the addition of a Convolutional Block Attention Module (CBAM). By examining the effectiveness of th...

ID: 2510.25797v1 cs.CV, cs.CL, cs.RO

arXiv PDF

📄 Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?

2025-10-30

Авторы:

Ziqi Ma, Sao Mai Nguyen, Philippe Xu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different ...

ID: 2510.24259v1 cs.CL, cs.RO

arXiv PDF

📄 Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views

2025-10-29

Авторы:

Anna Deichler, Jonas Beskow

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce Look and Tell, a multimodal dataset for studying referential communication across egocentric and exocentric perspectives. Using Meta Project Aria smart glasses and stationary cameras, we recorded synchronized gaze, speech, and video as 25 participants instructed a partner to identify ingredients in a kitchen. Combined with 3D scene reconstructions, this setup provides a benchmark for evaluating how different spatial representations (2D vs. 3D; ego vs. exo) affect multimodal groundin...

ID: 2510.22672v2 cs.CV, cs.CL, cs.RO, I.2.10; I.2.9; I.2.7; H.5.2

arXiv PDF

📄 From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction

2025-10-24

Авторы:

Zhida Zhao, Talas Fu, Yifan Wang, Lijun Wang, Huchuan Lu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite remarkable progress in driving world models, their potential for autonomous systems remains largely untapped: the world models are mostly learned for world simulation and decoupled from trajectory planning. While recent efforts aim to unify world modeling and planning in a single framework, the synergistic facilitation mechanism of world modeling for planning still requires further exploration. In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not on...

ID: 2510.19654v1 cs.CV, cs.AI, cs.CL, cs.RO

arXiv PDF

📄 SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving

2025-10-23

Авторы:

Xiangbo Gao, Tzu-Hsiang Lin, Ruojing Song, Yuheng Wu, Kuan-Ru Huang, Zicheng Jin, Fangzhou Lin, Shinan Liu, Zhengzhong Tu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Collaborative driving systems leverage vehicle-to-everything (V2X) communication across multiple agents to enhance driving safety and efficiency. Traditional V2X systems take raw sensor data, neural features, or perception results as communication media, which face persistent challenges, including high bandwidth demands, semantic loss, and interoperability issues. Recent advances investigate natural language as a promising medium, which can provide semantic richness, decision-level reasoning, an...

ID: 2510.18123v1 cs.CV, cs.AI, cs.CL, cs.RO

arXiv PDF

📄 Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation

2025-10-15

Авторы:

Mukul Lokhande, Tanushree Dewangan, Mohd Sharik Mansoori, Tejas Chaudhari, Akarsh J., Damayanti Lokhande, Adam Teman, Santosh Kumar Vishvakarma

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance ...

ID: 2510.10676v1 cs.AR, cs.CL, cs.RO, eess.AS

arXiv PDF

📄 More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models

2025-10-08

Авторы:

Xurui Song, Shuo Huai, JingJing Jiang, Jiayi Kong, Jun Luo

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Vision-Language Model (VLM) driving agents promise explainable end-to-end autonomy by first producing natural-language reasoning and then predicting trajectory planning. However, whether planning is causally driven by this reasoning remains a critical but unverified assumption. To investigate this, we build DriveMind, a large-scale driving Visual Question Answering (VQA) corpus with plan-aligned Chain-of-Thought (CoT), automatically generated from nuPlan. Our data generation process converts sen...

ID: 2510.04532v1 cs.AI, cs.CL, cs.RO

arXiv PDF

📄 Information Seeking for Robust Decision Making under Partial Observability

2025-10-04

Авторы:

Djengo Cyun-Jyun Fang, Tsung-Wei Ke

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Explicit information seeking is essential to human problem-solving in practical environments characterized by incomplete information and noisy dynamics. When the true environmental state is not directly observable, humans seek information to update their internal dynamics and inform future decision-making. Although existing Large Language Model (LLM) planning agents have addressed observational uncertainty, they often overlook discrepancies between their internal dynamics and the actual environm...

ID: 2510.01531v1 cs.AI, cs.CL, cs.RO

arXiv PDF

📄 Embodied AI: From LLMs to World Models

2025-09-26

Авторы:

Tongtong Feng, Xin Wang, Yu-Gang Jiang, Wenwu Zhu

## Контекст Embodied Artificial Intelligence (AI) представляет собой интеллектуальную систему, нацеленную на достижение Артифициального Общего Разума (AGI) и являющуюся основой для различных приложений. Она возглавляет переход от искусственных систем в синтетических пространствах к интеллектуальным системам, овладевающим физическими системами. Недавние достижения в области Large Language Models (LLMs) и World Models (WMs) стали значительным ускорением развития Embodied AI. LLMs возвели Embodied AI на новый уровень, способствуя семантическому разуму и декомпозиции задач, что позволяет использовать естественный язык для общения в рамках обучения. WMs, в свою очередь, позволяют эмбодированным системам внутренне представлять внешний мир и предсказывать его развитие, обеспечивая соответствие физическим законам при физических взаимодействиях. Этот труд подробно рассмотрел литературу по Embodied AI, от основных понятий до самых современных достижений, рассматривая как LLM-ориентированные, так и WM-ориентированные работы. ## Метод Методология, описанная в статье, охватывает всю ширину Embodied AI, стремясь к созданию комплексного подхода к её развитию. Технические решения основываются на использовании LLMs для улучшения естественного языка и понимания, а также WMs для представления внешнего мира и моделирования взаимодействия. Архитектура Embodied AI, как описано, строится на взаимодействии между этими двумя компонентами. Мультимодальные LLM (MLLM) и WMs объединяются в единую систему, чтобы позволить системе решать задачи в физическом пространстве с помощью естественного языка и понимания физических законов. Эта методология представляет собой современный подход к созданию систем, которые могут не только понимать естественный язык, но и взаимодействовать с физическим миром, придерживаясь его законов. ## Результаты Эксперименты, описанные в статье, установили связь между LLM-оптимизацией и WMs в рамках Embodied AI. Использовались различные данные, включая естественный язык, видео и сенсорные данные, чтобы проверить эффективность различных моделей. Результаты показали, что объединение LLM-моделей и WMs приводит к значительному улучшению в том, как системы могут решать задачи в физическом мире. Например, в сценарии, где необходимо использовать естественный язык для управления физическим интерфейсом, системы, основанные на этой архитектуре, показали значительное преимущество по сравнению с моделями, ориентированными только на один из этих двух аспектов. ## Значимость Практическая значимость Embodied AI заключается в том, что она может применяться в различных сферах, включая робототехни

Annotation:

Embodied Artificial Intelligence (AI) is an intelligent system paradigm for achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications and driving the evolution from cyberspace to physical systems. Recent breakthroughs in Large Language Models (LLMs) and World Models (WMs) have drawn significant attention for embodied AI. On the one hand, LLMs empower embodied AI via semantic reasoning and task decomposition, bringing high-level natural language instruct...

ID: 2509.20021v1 cs.AI, cs.CL, cs.RO

arXiv PDF

Показано 1 - 10 из 15 записей