Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games
2510.11691v1
cs.LG, cs.GT, stat.ML
2025-10-15
Авторы:
Taira Tsuchiya
Abstract
In two-player zero-sum games, the learning dynamic based on optimistic Hedge
achieves one of the best-known regret upper bounds among strongly-uncoupled
learning dynamics. With an appropriately chosen learning rate, the social and
individual regrets can be bounded by $O(\log(mn))$ in terms of the numbers of
actions $m$ and $n$ of the two players. This study investigates the optimality
of the dependence on $m$ and $n$ in the regret of optimistic Hedge. To this
end, we begin by refining existing regret analysis and show that, in the
strongly-uncoupled setting where the opponent's number of actions is known,
both the social and individual regret bounds can be improved to $O(\sqrt{\log m
\log n})$. In this analysis, we express the regret upper bound as an
optimization problem with respect to the learning rates and the coefficients of
certain negative terms, enabling refined analysis of the leading constants. We
then show that the existing social regret bound as well as these new social and
individual regret upper bounds cannot be further improved for optimistic Hedge
by providing algorithm-dependent individual regret lower bounds. Importantly,
these social regret upper and lower bounds match exactly including the constant
factor in the leading term. Finally, building on these results, we improve the
last-iterate convergence rate and the dynamic regret of a learning dynamic
based on optimistic Hedge, and complement these bounds with algorithm-dependent
dynamic regret lower bounds that match the improved bounds.
Ссылки и действия
Дополнительные ресурсы: