Fitting Reinforcement Learning Model to Behavioral Data under Bandits
2511.04454v1
cs.CE, cs.LG, math.OC, q-bio.NC, 90C25, 90C59, 90C90
2025-11-08
Авторы:
Hao Zhu, Jasper Hoffmann, Baohe Zhang, Joschka Boedecker
Abstract
We consider the problem of fitting a reinforcement learning (RL) model to
some given behavioral data under a multi-armed bandit environment. These models
have received much attention in recent years for characterizing human and
animal decision making behavior. We provide a generic mathematical optimization
problem formulation for the fitting problem of a wide range of RL models that
appear frequently in scientific research applications, followed by a detailed
theoretical analysis of its convexity properties. Based on the theoretical
results, we introduce a novel solution method for the fitting problem of RL
models based on convex relaxation and optimization. Our method is then
evaluated in several simulated bandit environments to compare with some
benchmark methods that appear in the literature. Numerical results indicate
that our method achieves comparable performance to the state-of-the-art, while
significantly reducing computation time. We also provide an open-source Python
package for our proposed method to empower researchers to apply it in the
analysis of their datasets directly, without prior knowledge of convex
optimization.