Decentralized Asynchronous Multi-player Bandits
2509.25824v1
cs.LG, stat.ML
2025-10-02
Авторы:
Jingqi Fan, Canzhe Zhao, Shuai Li, Siwei Wang
Abstract
In recent years, multi-player multi-armed bandits (MP-MAB) have been
extensively studied due to their wide applications in cognitive radio networks
and Internet of Things systems. While most existing research on MP-MAB focuses
on synchronized settings, real-world systems are often decentralized and
asynchronous, where players may enter or leave the system at arbitrary times,
and do not have a global clock. This decentralized asynchronous setting
introduces two major challenges. First, without a global time, players cannot
implicitly coordinate their actions through time, making it difficult to avoid
collisions. Second, it is important to detect how many players are in the
system, but doing so may cost a lot. In this paper, we address the challenges
posed by such a fully asynchronous setting in a decentralized environment. We
develop a novel algorithm in which players adaptively change between
exploration and exploitation. During exploration, players uniformly pull their
arms, reducing the probability of collisions and effectively mitigating the
first challenge. Meanwhile, players continue pulling arms currently exploited
by others with a small probability, enabling them to detect when a player has
left, thereby addressing the second challenge. We prove that our algorithm
achieves a regret of $\mathcal{O}(\sqrt{T \log T} + {\log T}/{\Delta^2})$,
where $\Delta$ is the minimum expected reward gap between any two arms. To the
best of our knowledge, this is the first efficient MP-MAB algorithm in the
asynchronous and decentralized environment. Extensive experiments further
validate the effectiveness and robustness of our algorithm, demonstrating its
applicability to real-world scenarios.
Ссылки и действия
Дополнительные ресурсы: