Energy Consumption in Parallel Neural Network Training
2508.07706v1
cs.LG, cs.AI
2025-08-13
Авторы:
Philipp Huber, David Li, Juan Pedro Gutiérrez Hermosillo Muriedas, Deifilia Kieckhefen, Markus Götz, Achim Streit, Charlotte Debus
Резюме на русском
## Контекст
The demand for computational power in training neural networks has surged significantly, driven by advancements in model architectures and the need for larger datasets. This growth has led to a substantial increase in energy consumption, posing environmental challenges and raising concerns about the sustainability of AI research. Parallelization has become a key strategy to address these challenges, enabling the scaling of model and dataset sizes and accelerating training processes. However, its influence on energy consumption remains insufficiently understood. This study addresses this gap by investigating how parallelization parameters, such as GPU count, global and local batch sizes, impact energy efficiency, predictive performance, and training time. The research focuses on the training of two models: ResNet50 and FourCastNet, providing insights into the complex dynamics of energy use in neural network training.
## Метод
The study employed scaling experiments to evaluate the impact of parallelization parameters on the training of ResNet50 and FourCastNet models. These experiments involved varying the number of GPUs, global batch sizes, and local batch sizes to analyze their influence on training time, predictive performance, and energy consumption. The experiments were conducted on high-performance computing infrastructure, ensuring reliable and reproducible results. The metrics included energy usage per GPU hour, training time, and model accuracy. By systematically analyzing these factors, the authors aimed to uncover the complex interplay between parallelization parameters and their effect on energy efficiency in neural network training.
## Результаты
The experiments revealed that energy consumption scales approximately linearly with the consumed GPU hours, but the scaling factor varies significantly between different models and hardware configurations. For ResNet50, increasing the global batch size led to a more efficient use of resources, with a smaller increase in energy consumption relative to the increase in GPU hours. In contrast, FourCastNet demonstrated a more complex relationship, with local batch size playing a more critical role in determining energy efficiency. The results also highlighted that the number of samples and gradient updates per GPU hour strongly influences the overall energy consumption. These findings provide a detailed understanding of the factors affecting energy use in neural network training and highlight the importance of optimizing parallelization strategies for sustainable AI research.
## Значимость
The insights from this study are highly relevant for improving the sustainability of AI research. By quantifying the energy costs associated with parallelization, the research provides a foundation for developing more energy-efficient training strategies. The findings have practical applications in optimizing GPU utilization, reducing energy consumption, and minimizing the environmental impact of neural network training. Furthermore, the study informs the design of future hardware and software solutions tailored to the specific needs of energy-efficient AI training. The results contribute to the broader goal of making AI more sustainable and environmentally friendly.
## Выводы
The study underscores the importance of understanding the energy dynamics in neural network training and the critical role of parallelization parameters in influencing energy consumption. It demonstrates that while parallelization accelerates training and enables the handling of larger datasets, it also introduces significant energy costs that must be carefully managed. The findings provide a basis for future research into more sustainable AI practices, including the development of energy-efficient algorithms and hardware. By addressing the challenges of energy use in neural network training, this research advances the field towards a more sustainable and responsible use of AI technologies.
Abstract
The increasing demand for computational resources of training neural networks
leads to a concerning growth in energy consumption. While parallelization has
enabled upscaling model and dataset sizes and accelerated training, its impact
on energy consumption is often overlooked. To close this research gap, we
conducted scaling experiments for data-parallel training of two models,
ResNet50 and FourCastNet, and evaluated the impact of parallelization
parameters, i.e., GPU count, global batch size, and local batch size, on
predictive performance, training time, and energy consumption. We show that
energy consumption scales approximately linearly with the consumed resources,
i.e., GPU hours; however, the respective scaling factor differs substantially
between distinct model trainings and hardware, and is systematically influenced
by the number of samples and gradient updates per GPU hour. Our results shed
light on the complex interplay of scaling up neural network training and can
inform future developments towards more sustainable AI research.
Ссылки и действия
Дополнительные ресурсы: