Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets
2510.14097v1
cs.LG, cs.GT, math.OC, math.PR
2025-10-19
Авторы:
Zixian Yang, Sushil Mahavir Varma, Lei Ying
Abstract
We study a two-sided market, wherein, price-sensitive heterogeneous customers
and servers arrive and join their respective queues. A compatible
customer-server pair can then be matched by the platform, at which point, they
leave the system. Our objective is to design pricing and matching algorithms
that maximize the platform's profit, while maintaining reasonable queue
lengths. As the demand and supply curves governing the price-dependent arrival
rates may not be known in practice, we design a novel online-learning-based
pricing policy and establish its near-optimality. In particular, we prove a
tradeoff among three performance metrics: $\tilde{O}(T^{1-\gamma})$ regret,
$\tilde{O}(T^{\gamma/2})$ average queue length, and $\tilde{O}(T^{\gamma})$
maximum queue length for $\gamma \in (0, 1/6]$, significantly improving over
existing results [1]. Moreover, barring the permissible range of $\gamma$, we
show that this trade-off between regret and average queue length is optimal up
to logarithmic factors under a class of policies, matching the optimal one as
in [2] which assumes the demand and supply curves to be known. Our proposed
policy has two noteworthy features: a dynamic component that optimizes the
tradeoff between low regret and small queue lengths; and a probabilistic
component that resolves the tension between obtaining useful samples for fast
learning and maintaining small queue lengths.