📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

2025-11-11

Авторы:

Yiting He, Zhishuai Liu, Weixin Wang, Pan Xu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Off-dynamics reinforcement learning (RL), where training and deployment transition dynamics are different, can be formulated as learning in a robust Markov decision process (RMDP) where uncertainties in transition dynamics are imposed. Existing literature mostly assumes access to generative models allowing arbitrary state-action queries or pre-collected datasets with a good state coverage of the deployment environment, bypassing the challenge of exploration. In this work, we study a more realist...

ID: 2511.05396v1 cs.LG, cs.AI, cs.RO, stat.ML

arXiv PDF

📄 Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance

2025-11-06

Авторы:

Rathin Chandra Shit

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Catastrophic forgetting is one of the fundamental issues of continual learning because neural networks forget the tasks learned previously when trained on new tasks. The proposed framework is a new path-coordinated framework of continual learning that unites the Neural Tangent Kernel (NTK) theory of principled plasticity bounds, statistical validation by Wilson confidence intervals, and evaluation of path quality by the use of multiple metrics. Experimental evaluation shows an average accuracy o...

ID: 2511.02025v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

2025-10-31

Авторы:

Nikita Kachaev, Mikhail Kolosov, Daniil Zelezetsky, Alexey K. Kovalev, Aleksandr I. Panov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

The growing success of Vision-Language-Action (VLA) models stems from the promise that pretrained Vision-Language Models (VLMs) can endow agents with transferable world knowledge and vision-language (VL) grounding, laying a foundation for action models with broader generalization. Yet when these VLMs are adapted to the action modality, it remains unclear to what extent their original VL representations and knowledge are preserved. In this work, we conduct a systematic study of representation ret...

ID: 2510.25616v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Learning Parameterized Skills from Demonstrations

2025-10-30

Авторы:

Vedant Gupta, Haotian Fu, Calvin Luo, Yiding Jiang, George Konidaris

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extende...

ID: 2510.24095v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Sample-efficient and Scalable Exploration in Continuous-Time RL

2025-10-30

Авторы:

Klemens Iten, Lenart Treven, Bhavya Sukhija, Florian Dörfler, Andreas Krause

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying O...

ID: 2510.24482v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs

2025-10-28

Авторы:

Yunuo Zhang, Baiting Luo, Ayan Mukhopadhyay, Gabor Karsai, Abhishek Dubey

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning ...

ID: 2510.21107v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 Semantic World Models

2025-10-24

Авторы:

Jacob Berg, Chuning Zhu, Yanda Bao, Ishan Durugkar, Abhishek Gupta

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actual planning objective; strong pixel reconstruction does not always correlate with good planning decisions. This paper posits that instead of reconstructing future frames as pixels, world models only ne...

ID: 2510.19818v1 cs.LG, cs.AI, cs.RO

arXiv PDF

📄 SPACeR: Self-Play Anchoring with Centralized Reference Models

2025-10-23

Авторы:

Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, Wei Zhan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally ...

ID: 2510.18060v1 cs.LG, cs.AI, cs.RO, I.2.9; I.2.6

arXiv PDF

📄 Actor-Free Continuous Control via Structurally Maximizable Q-Functions

2025-10-23

Авторы:

Yigit Korkmaz, Urvi Bhuwania, Ayush Jain, Erdem Bıyık

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data...

ID: 2510.18828v1 cs.LG, cs.AI, cs.RO, stat.ML

arXiv PDF

📄 An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning

2025-10-22

Авторы:

Lindsay Spoor, Álvaro Serra-Gómez, Aske Plaat, Thomas Moerland

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constrai...

ID: 2510.17564v1 cs.LG, cs.AI, cs.RO, cs.SY, eess.SY

arXiv PDF

Показано 11 - 20 из 41 записей