Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks

2510.06138v1 cs.LG, cs.AI, I.2.6 2025-10-09
Авторы:

Rushiv Arora

Abstract

Multi-task reinforcement learning often relies on task metadata -- such as brief natural-language descriptions -- to guide behavior across diverse objectives. We present Lexical Policy Networks (LEXPOL), a language-conditioned mixture-of-policies architecture for multi-task RL. LEXPOL encodes task metadata with a text encoder and uses a learned gating module to select or blend among multiple sub-policies, enabling end-to-end training across tasks. On MetaWorld benchmarks, LEXPOL matches or exceeds strong multi-task baselines in success rate and sample efficiency, without task-specific retraining. To analyze the mechanism, we further study settings with fixed expert policies obtained independently of the gate and show that the learned language gate composes these experts to produce behaviors appropriate to novel task descriptions and unseen task combinations. These results indicate that natural-language metadata can effectively index and recombine reusable skills within a single policy.

Ссылки и действия

Связанные статьи

1 bit is all we need: binary normalized neural networks

## Контекст Настоящее исследование сосредоточено на проблемах, связанных с размером и вычислительной сложностью больших ...

2025-09-12

Deep Residual Echo State Networks: exploring residual orthogonal connections in ...

## Контекст Echo State Networks (ESNs) — это вид необученных Recurrent Neural Networks (RNNs) в контексте Reservoir Comp...

2025-09-02

Residual Reservoir Memory Networks

## Контекст Residual Reservoir Memory Networks (ResRMNs) — это новая класса необученных рекуррентных нейронных сетей (Re...

2025-08-15