Dissecting Transformers: A CLEAR Perspective towards Green AI
2510.02810v1
cs.LG, cs.AI, cs.SE
2025-10-07
Авторы:
Hemang Jain, Shailender Goyal, Divyansh Pandey, Karthik Vaidhyanathan
Abstract
The rapid adoption of Large Language Models (LLMs) has raised significant
environmental concerns. Unlike the one-time cost of training, LLM inference
occurs continuously at a global scale and now dominates the AI energy
footprint. Yet, most sustainability studies report only coarse, model-level
metrics due to the lack of fine-grained measurement methods, treating energy
efficiency more as an afterthought than as a primary objective. We present the
first fine-grained empirical analysis of inference energy across core
components of transformer architecture. We propose a novel methodology,
Component-Level Energy Assessment via Repeated sampling (CLEAR), to overcome
temporal mismatch between microsecond scale component execution and monitoring
of millisecond (ms) scale energy sensors. Using CLEAR, we evaluate 15 models
spanning four distinct architecture types and consistently keep component-wise
energy variance below 9.5\% while capturing more than 90\% of the model's total
energy as individual components. Our empirical analysis reveals that Attention
blocks consume significantly more energy per floating-point operation (FLOP),
indicating that energy consumption is not proportionally aligned with FLOP
counts. This shows that FLOPs alone fail to capture the true energy cost at a
component level. Our findings establish detailed component-level energy
baselines and provide insight as an initial step to build energy-efficient
transformer models through component-level optimizations.
Ссылки и действия
Дополнительные ресурсы: