Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication
2510.12265v1
cs.MM, cs.AI, cs.NI, cs.SY, eess.SY
2025-10-16
Авторы:
Sami Khairy, Gabriel Mittag, Vishak Gopal, Ross Cutler
Abstract
The quality of experience (QoE) delivered by video conferencing systems is
significantly influenced by accurately estimating the time-varying available
bandwidth between the sender and receiver. Bandwidth estimation for real-time
communications remains an open challenge due to rapidly evolving network
architectures, increasingly complex protocol stacks, and the difficulty of
defining QoE metrics that reliably improve user experience. In this work, we
propose a deployed, human-in-the-loop, data-driven framework for bandwidth
estimation to address these challenges. Our approach begins with training
objective QoE reward models derived from subjective user evaluations to measure
audio and video quality in real-time video conferencing systems. Subsequently,
we collect roughly $1$M network traces with objective QoE rewards from
real-world Microsoft Teams calls to curate a bandwidth estimation training
dataset. We then introduce a novel distributional offline reinforcement
learning (RL) algorithm to train a neural-network-based bandwidth estimator
aimed at improving QoE for users. Our real-world A/B test demonstrates that the
proposed approach reduces the subjective poor call ratio by $11.41\%$ compared
to the baseline bandwidth estimator. Furthermore, the proposed offline RL
algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond
bandwidth estimation.