📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Towards Robust Zero-Shot Reinforcement Learning

2025-10-21

Авторы:

Kexin Zheng, Lauriane Teyssier, Yinan Zheng, Yu Luo, Xiayuan Zhan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representat...

ID: 2510.15382v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control

2025-10-17

Авторы:

Nikita Kachaev, Daniil Zelezetsky, Egor Cherepanov, Alexey K. Kovelev, Aleksandr I. Panov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Despite their effectiveness and popularity in offline or model-based reinforcement learning (RL), transformers remain underexplored in online model-free RL due to their sensitivity to training setups and model design decisions such as how to structure the policy and value networks, share components, or handle temporal information. In this paper, we show that transformers can be strong baselines for continuous control in online model-free RL. We investigate key design questions: how to condition ...

ID: 2510.13367v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

2025-10-17

Авторы:

Johan Obando-Ceron, Walter Mayor, Samuel Lavoie, Scott Fujimoto, Aaron Courville, Pablo Samuel Castro

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layer...

ID: 2510.13704v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving

2025-10-16

Авторы:

Kanishkha Jaisankar, Sunidhi Tandel

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structur...

ID: 2510.10503v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 DEAS: DEtached value learning with Action Sequence for Scalable Offline RL

2025-10-11

Авторы:

Changyeon Kim, Haeone Lee, Younggyo Seo, Kimin Lee, Yuke Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Offline reinforcement learning (RL) presents an attractive paradigm for training intelligent agents without expensive online interactions. However, current approaches still struggle with complex, long-horizon sequential decision making. In this work, we introduce DEtached value learning with Action Sequence (DEAS), a simple yet effective offline RL framework that leverages action sequences for value learning. These temporally extended actions provide richer information than single-step actions a...

ID: 2510.07730v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning

2025-10-10

Авторы:

Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realis...

ID: 2510.06913v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report

2025-10-10

Авторы:

Riccardo Mereu, Aidan Scannell, Yuxin Hou, Yi Zhao, Aditya Jitta, Antonio Dominguez, Luigi Acerbi, Amos Storkey, Paul Chang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5...

ID: 2510.07092v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL

2025-10-10

Авторы:

Egor Cherepanov, Alexey K. Kovalev, Aleksandr I. Panov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer...

ID: 2510.07151v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 A KL-regularization framework for learning to plan with adaptive priors

2025-10-08

Авторы:

Álvaro Serra-Gomez, Daniel Jarne Ornia, Dhruva Tirumala, Thomas Moerland

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Effective exploration remains a central challenge in model-based reinforcement learning (MBRL), particularly in high-dimensional continuous control tasks where sample efficiency is crucial. A prominent line of recent work leverages learned policies as proposal distributions for Model-Predictive Path Integral (MPPI) planning. Initial approaches update the sampling policy independently of the planner distribution, typically maximizing a learned value function with deterministic policy gradient and...

ID: 2510.04280v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Predictive Preference Learning from Human Interventions

2025-10-04

Авторы:

Haoyuan Cai, Zhenghao Peng, Bolei Zhou

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inf...

ID: 2510.01545v1 cs.LG, cs.AI, cs.RO

arXiv PDF

Показано 21 - 30 из 41 записей