FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow
2511.04768v1
cs.LG, cs.AR, cs.PL
2025-11-11
Авторы:
Rubens Lacouture, Nathan Zhang, Ritvik Sharma, Marco Siracusa, Fredrik Kjolstad, Kunle Olukotun, Olivia Hsu
Abstract
As deep learning models scale, sparse computation and specialized dataflow
hardware have emerged as powerful solutions to address efficiency. We propose
FuseFlow, a compiler that converts sparse machine learning models written in
PyTorch to fused sparse dataflow graphs for reconfigurable dataflow
architectures (RDAs). FuseFlow is the first compiler to support general
cross-expression fusion of sparse operations. In addition to fusion across
kernels (expressions), FuseFlow also supports optimizations like
parallelization, dataflow ordering, and sparsity blocking. It targets a
cycle-accurate dataflow simulator for microarchitectural analysis of fusion
strategies. We use FuseFlow for design-space exploration across four real-world
machine learning applications with sparsity, showing that full fusion (entire
cross-expression fusion across all computation in an end-to-end model) is not
always optimal for sparse models-fusion granularity depends on the model
itself. FuseFlow also provides a heuristic to identify and prune suboptimal
configurations. Using Fuseflow, we achieve performance improvements, including
a ~2.7x speedup over an unfused baseline for GPT-3 with BigBird block-sparse
attention.
Ссылки и действия
Дополнительные ресурсы: