Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
2510.12523v1
cs.LG, math.OC
2025-10-16
Авторы:
Ahmed Ben Yahmed, Hafedh El Ferchichi, Marc Abeille, Vianney Perchet
Abstract
We examine a multi-armed bandit problem with contextual information, where
the objective is to ensure that each arm receives a minimum aggregated reward
across contexts while simultaneously maximizing the total cumulative reward.
This framework captures a broad class of real-world applications where fair
revenue allocation is critical and contextual variation is inherent. The
cross-context aggregation of minimum reward constraints, while enabling better
performance and easier feasibility, introduces significant technical challenges
-- particularly the absence of closed-form optimal allocations typically
available in standard MAB settings. We design and analyze algorithms that
either optimistically prioritize performance or pessimistically enforce
constraint satisfaction. For each algorithm, we derive problem-dependent upper
bounds on both regret and constraint violations. Furthermore, we establish a
lower bound demonstrating that the dependence on the time horizon in our
results is optimal in general and revealing fundamental limitations of the free
exploration principle leveraged in prior work.
Ссылки и действия
Дополнительные ресурсы: