Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators

2510.24113v1 cs.AR, cs.AI, cs.LG 2025-10-30

Авторы:

Arnav Shukla, Harsh Sharma, Srikant Bharadwaj, Vinayak Abrol, Sujay Deb

Abstract

Heterogeneous chiplet-based systems improve scaling by disag-gregating CPUs/GPUs and emerging technologies (HBM/DRAM).However this on-package disaggregation introduces a latency inNetwork-on-Interposer(NoI). We observe that in modern large-modelinference, parameters and activations routinely move backand forth from HBM/DRAM, injecting large, bursty flows into theinterposer. These memory-driven transfers inflate tail latency andviolate Service Level Agreements (SLAs) across k-ary n-cube base-line NoI topologies. To address this gap we introduce an InterferenceScore (IS) that quantifies worst-case slowdown under contention.We then formulate NoI synthesis as a multi-objective optimization(MOO) problem. We develop PARL (Partition-Aware ReinforcementLearner), a topology generator that balances throughput, latency,and power. PARL-generated topologies reduce contention at the memory cut, meet SLAs, and cut worst-case slowdown to 1.2 times while maintaining competitive mean throughput relative to link-rich meshes. Overall, this reframes NoI design for heterogeneouschiplet accelerators with workload-aware objectives.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators

Авторы:

Abstract

Ссылки и действия

Связанные статьи

The Role of Advanced Computer Architectures in Accelerating Artificial Intellige...

AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation i...

eIQ Neutron: Redefining Edge-AI Inference with Integrated NPU and Compiler Innov...

LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications

HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM...

Навигация