SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud Microservices

2511.04774v1 cs.LG, cs.AR 2025-11-11

Авторы:

Liu Jiang, Zerui Bao, Shiqi Sheng, Di Zhu

Abstract

Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy. We revisit instruction prefetching for these cloud workloads and present a design that aligns with SLO driven and self optimizing systems. Building on the Entangling Instruction Prefetcher (EIP), we introduce a Compressed Entry that captures up to eight destinations around a base using 36 bits by exploiting spatial clustering, and a Hierarchical Metadata Storage scheme that keeps only L1 resident and frequently queried entries on chip while virtualizing bulk metadata into lower levels. We further add a lightweight Online ML Controller that scores prefetch profitability using context features and a bandit adjusted threshold. On data center applications, our approach preserves EIP like speedups with smaller on chip state and improves efficiency for networked services in the ML era.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud Microservices

Авторы:

Abstract

Ссылки и действия

Связанные статьи

ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via L...

Context-Aware Mixture-of-Experts Inference on CXL-Enabled GPU-NDP Systems

ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via L...

ZeroSim: Zero-Shot Analog Circuit Evaluation with Unified Transformer Embeddings

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transfo...

Навигация