Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
2510.21155v1
cs.DC, cs.AI, cs.LG
2025-10-28
Авторы:
Dandan Liang, Jianing Zhang, Evan Chen, Zhe Li, Rui Li, Haibo Yang
Abstract
Split Federated Learning (SFL) enables scalable training on edge devices by
combining the parallelism of Federated Learning (FL) with the computational
offloading of Split Learning (SL). Despite its great success, SFL suffers
significantly from the well-known straggler issue in distributed learning
systems. This problem is exacerbated by the dependency between Split Server and
clients: the Split Server side model update relies on receiving activations
from clients. Such synchronization requirement introduces significant time
latency, making straggler a critical bottleneck to the scalability and
efficiency of the system. To mitigate this problem, we propose MU-SplitFed, a
straggler-resilient SFL algorithm in zeroth-order optimization that decouples
training progress from straggler delays via a simple yet effective unbalanced
update mechanism.
By enabling the server to perform $\tau$ local updates per client round,
MU-SplitFed achieves a convergence rate of $O(\sqrt{d/(\tau T)})$ for
non-convex objectives, demonstrating a linear speedup of $\tau$ in
communication rounds. Experiments demonstrate that MU-SplitFed consistently
outperforms baseline methods with the presence of stragglers and effectively
mitigates their impact through adaptive tuning of $\tau$. The code for this
project is available at https://github.com/Johnny-Zip/MU-SplitFed.
Ссылки и действия
Дополнительные ресурсы: