When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
2510.09096v1
cs.RO, cs.AI, cs.LG
2025-10-14
Авторы:
Xinhu Li, Ayush Jain, Zhaojing Yang, Yigit Korkmaz, Erdem Bıyık
Abstract
Learning from demonstrations enables experts to teach robots complex tasks
using interfaces such as kinesthetic teaching, joystick control, and
sim-to-real transfer. However, these interfaces often constrain the expert's
ability to demonstrate optimal behavior due to indirect control, setup
restrictions, and hardware safety. For example, a joystick can move a robotic
arm only in a 2D plane, even though the robot operates in a higher-dimensional
space. As a result, the demonstrations collected by constrained experts lead to
suboptimal performance of the learned policies. This raises a key question: Can
a robot learn a better policy than the one demonstrated by a constrained
expert? We address this by allowing the agent to go beyond direct imitation of
expert actions and explore shorter and more efficient trajectories. We use the
demonstrations to infer a state-only reward signal that measures task progress,
and self-label reward for unknown states using temporal interpolation. Our
approach outperforms common imitation learning in both sample efficiency and
task completion time. On a real WidowX robotic arm, it completes the task in 12
seconds, 10x faster than behavioral cloning, as shown in real-robot videos on
https://sites.google.com/view/constrainedexpert .
Ссылки и действия
Дополнительные ресурсы: