TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
2509.26627v1
cs.AI, cs.LG, cs.RO
2025-10-02
Авторы:
Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao
Abstract
Designing dense rewards is crucial for reinforcement learning (RL), yet in
robotics it often demands extensive manual effort and lacks scalability. One
promising solution is to view task progress as a dense reward signal, as it
quantifies the degree to which actions advance the system toward task
completion over time. We present TimeRewarder, a simple yet effective reward
learning method that derives progress estimation signals from passive videos,
including robot demonstrations and human videos, by modeling temporal distances
between frame pairs. We then demonstrate how TimeRewarder can supply step-wise
proxy rewards to guide reinforcement learning. In our comprehensive experiments
on ten challenging Meta-World tasks, we show that TimeRewarder dramatically
improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10
tasks with only 200,000 interactions per task with the environment. This
approach outperformed previous methods and even the manually designed
environment dense reward on both the final success rate and sample efficiency.
Moreover, we show that TimeRewarder pretraining can exploit real-world human
videos, highlighting its potential as a scalable approach path to rich reward
signals from diverse video sources.
Ссылки и действия
Дополнительные ресурсы: