Microscaling Floating Point Formats for Large Language Models

2510.01863v1 cs.NE, cs.LG 2025-10-04

Авторы:

Marco Cococcioni, Dario Pagani, Federico Rossi

Abstract

The increasing computational and memory demands of large language models (LLMs) necessitate innovative approaches to optimize resource usage without compromising performance. This paper leverages microscaling floating-point formats, a novel technique designed to address these challenges by reducing the storage and computational overhead associated with numerical representations in LLMs. Unlike traditional floating-point representations that allocate a dedicated scale for each value, microscaling employs a shared scale across a block of values, enabling compact one-byte floating-point representations while maintaining an extended dynamic range. We explore the application of microscaling in the context of 8-bit floating-point formats to significantly reduce memory footprint and computational costs. We tested several configurations of microscaling floats within the GPT-2 LLM architecture, demonstrating that microscaling data formats can achieve competitive accuracy during training and inference, proving its efficacy as a resource-efficient alternative for deploying LLMs at scale. The source code is publicly available at: https://github.com/unipi-dii-compressedarith/llm.c-sve

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Microscaling Floating Point Formats for Large Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

The Evolution of Learning Algorithms for Artificial Neural Networks

Learning Where, What and How to Transfer: A Multi-Role Reinforcement Learning Ap...

A Complete Pipeline for deploying SNNs with Synaptic Delays on Loihi 2

Scaling Equilibrium Propagation to Deeper Neural Network Architectures

NeuFACO: Neural Focused Ant Colony Optimization for Traveling Salesman Problem

Навигация