📊 Статистика дайджестов

Всего дайджестов: 34123 Добавлено сегодня: 101

Последнее обновление: сегодня

📄 Convergence and stability of Q-learning in Hierarchical Reinforcement Learning

2025-11-25

Авторы:

Massimiliano Manenti, Andrea Iannelli

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Hierarchical Reinforcement Learning promises, among other benefits, to efficiently capture and utilize the temporal structure of a decision-making problem and to enhance continual learning capabilities, but theoretical guarantees lag behind practice. In this paper, we propose a Feudal Q-learning scheme and investigate under which conditions its coupled updates converge and are stable. By leveraging the theory of Stochastic Approximation and the ODE method, we present a theorem stating the conver...

ID: 2511.17351v1 cs.LG, eess.SY, math.OC

arXiv PDF

📄 Harnessing Data from Clustered LQR Systems: Personalized and Collaborative Policy Optimization

2025-11-25

Авторы:

Vinay Kanakeri, Shivam Bajaj, Ashwin Verma, Vijay Gupta, Aritra Mitra

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

It is known that reinforcement learning (RL) is data-hungry. To improve sample-efficiency of RL, it has been proposed that the learning algorithm utilize data from 'approximately similar' processes. However, since the process models are unknown, identifying which other processes are similar poses a challenge. In this work, we study this problem in the context of the benchmark Linear Quadratic Regulator (LQR) setting. Specifically, we consider a setting with multiple agents, each corresponding to...

ID: 2511.17489v1 cs.LG, eess.SY, math.OC

arXiv PDF

📄 Optimizing Operation Recipes with Reinforcement Learning for Safe and Interpretable Control of Chemical Processes

2025-11-22

Авторы:

Dean Brandner, Sergio Lucia

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Optimal operation of chemical processes is vital for energy, resource, and cost savings in chemical engineering. The problem of optimal operation can be tackled with reinforcement learning, but traditional reinforcement learning methods face challenges due to hard constraints related to quality and safety that must be strictly satisfied, and the large amount of required training data. Chemical processes often cannot provide sufficient experimental data, and while detailed dynamic models can be a...

ID: 2511.16297v1 cs.LG, eess.SY

arXiv PDF

📄 VersaPants: A Loose-Fitting Textile Capacitive Sensing System for Lower-Body Motion Capture

2025-11-22

Авторы:

Deniz Kasap, Taraneh Aminosharieh Najafi, Jérôme Paul Rémy Thevenot, Jonathan Dan, Stefano Albini, David Atienza

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We present VersaPants, the first loose-fitting, textile-based capacitive sensing system for lower-body motion capture, built on the open-hardware VersaSens platform. By integrating conductive textile patches and a compact acquisition unit into a pair of pants, the system reconstructs lower-body pose without compromising comfort. Unlike IMU-based systems that require user-specific fitting or camera-based methods that compromise privacy, our approach operates without fitting adjustments and preser...

ID: 2511.16346v1 eess.SP, cs.LG, eess.SY

arXiv PDF

📄 Synthesis of Safety Specifications for Probabilistic Systems

2025-11-22

Авторы:

Gaspard Ohlmann, Edwin Hamel-De le Court, Francesco Belardinelli

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Ensuring that agents satisfy safety specifications can be crucial in safety-critical environments. While methods exist for controller synthesis with safe temporal specifications, most existing methods restrict safe temporal specifications to probabilistic-avoidance constraints. Formal methods typically offer more expressive ways to express safety in probabilistic systems, such as Probabilistic Computation Tree Logic (PCTL) formulas. Thus, in this paper, we develop a new approach that supports mo...

ID: 2511.16579v1 cs.LO, cs.AI, cs.LG, eess.SY

arXiv PDF

📄 Wasserstein Distributionally Robust Nash Equilibrium Seeking with Heterogeneous Data: A Lagrangian Approach

2025-11-20

Авторы:

Zifan Wang, Georgios Pantazis, Sergio Grammatico, Michael M. Zavlanos, Karl H. Johansson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We study a class of distributionally robust games where agents are allowed to heterogeneously choose their risk aversion with respect to distributional shifts of the uncertainty. In our formulation, heterogeneous Wasserstein ball constraints on each distribution are enforced through a penalty function leveraging a Lagrangian formulation. We then formulate the distributionally robust Nash equilibrium problem and show that under certain assumptions it is equivalent to a finite-dimensional variatio...

ID: 2511.14048v1 math.OC, cs.LG, eess.SY

arXiv PDF

📄 Robust Verification of Controllers under State Uncertainty via Hamilton-Jacobi Reachability Analysis

2025-11-20

Авторы:

Albert Lin, Alessandro Pinto, Somil Bansal

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As perception-based controllers for autonomous systems become increasingly popular in the real world, it is important that we can formally verify their safety and performance despite perceptual uncertainty. Unfortunately, the verification of such systems remains challenging, largely due to the complexity of the controllers, which are often nonlinear, nonconvex, learning-based, and/or black-box. Prior works propose verification algorithms that are based on approximate reachability methods, but th...

ID: 2511.14755v1 cs.RO, cs.LG, eess.SY

arXiv PDF

📄 Fusion-ResNet: A Lightweight multi-label NILM Model Using PCA-ICA Feature Fusion

2025-11-19

Авторы:

Sahar Moghimian Hoosh, Ilia Kamyshev, Henni Ouerdane

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Non-intrusive load monitoring (NILM) is an advanced load monitoring technique that uses data-driven algorithms to disaggregate the total power consumption of a household into the consumption of individual appliances. However, real-world NILM deployment still faces major challenges, including overfitting, low model generalization, and disaggregating a large number of appliances operating at the same time. To address these challenges, this work proposes an end-to-end framework for the NILM classif...

ID: 2511.12139v1 cs.LG, eess.SY

arXiv PDF

📄 Logarithmic Regret and Polynomial Scaling in Online Multi-step-ahead Prediction

2025-11-19

Авторы:

Jiachen Qian, Yang Zheng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This letter studies the problem of online multi-step-ahead prediction for unknown linear stochastic systems. Using conditional distribution theory, we derive an optimal parameterization of the prediction policy as a linear function of future inputs, past inputs, and past outputs. Based on this characterization, we propose an online least-squares algorithm to learn the policy and analyze its regret relative to the optimal model-based predictor. We show that the online algorithm achieves logarithm...

ID: 2511.12467v1 cs.LG, eess.SY

arXiv PDF

📄 DiffFP: Learning Behaviors from Scratch via Diffusion-based Fictitious Play

2025-11-19

Авторы:

Akash Karthikeyan, Yash Vardhan Pant

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Self-play reinforcement learning has demonstrated significant success in learning complex strategic and interactive behaviors in competitive multi-agent games. However, achieving such behaviors in continuous decision spaces remains challenging. Ensuring adaptability and generalization in self-play settings is critical for achieving competitive performance in dynamic multi-agent environments. These challenges often cause methods to converge slowly or fail to converge at all to a Nash equilibrium,...

ID: 2511.13186v1 cs.LG, eess.SY

arXiv PDF

Показано 21 - 30 из 42 записей