TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes
2510.00906v1
eess.SY, cs.AI, cs.LG, cs.SY
2025-10-04
Авторы:
Julian Lemmel, Manuel Kranzl, Adam Lamine, Philipp Neubauer, Radu Grosu, Sophie A. Neubauer
Abstract
Interactive Imitation Learning deals with training a novice policy from
expert demonstrations in an online fashion. The established DAgger algorithm
trains a robust novice policy by alternating between interacting with the
environment and retraining of the network. Many variants thereof exist, that
differ in the method of discerning whether to allow the novice to act or return
control to the expert. We propose the use of stochastic reachtubes - common in
verification of dynamical systems - as a novel method for estimating the
necessity of expert intervention. Our approach does not require fine-tuning of
decision thresholds per environment and effectively reduces the number of
expert interventions, especially when compared with related approaches that
make use of a doubt classification model.