Benchmarking On-Device Machine Learning on Apple Silicon with MLX
2510.18921v1
cs.LG, cs.AI, cs.CL
2025-10-24
Авторы:
Oluwaseun A. Ajayi, Ogundepo Odunayo
Abstract
The recent widespread adoption of Large Language Models (LLMs) and machine
learning in general has sparked research interest in exploring the
possibilities of deploying these models on smaller devices such as laptops and
mobile phones. This creates a need for frameworks and approaches that are
capable of taking advantage of on-device hardware. The MLX framework was
created to address this need. It is a framework optimized for machine learning
(ML) computations on Apple silicon devices, facilitating easier research,
experimentation, and prototyping.
This paper presents a performance evaluation of MLX, focusing on inference
latency of transformer models. We compare the performance of different
transformer architecture implementations in MLX with their Pytorch
counterparts. For this research we create a framework called MLX-transformers
which includes different transformer implementations in MLX and downloads the
model checkpoints in pytorch and converts it to the MLX format. By leveraging
the advanced architecture and capabilities of Apple Silicon, MLX-Transformers
enables seamless execution of transformer models directly sourced from Hugging
Face, eliminating the need for checkpoint conversion often required when
porting models between frameworks.
Our study benchmarks different transformer models on two Apple Silicon
macbook devices against an NVIDIA CUDA GPU. Specifically, we compare the
inference latency performance of models with the same parameter sizes and
checkpoints. We evaluate the performance of BERT, RoBERTa, and XLM-RoBERTa
models, with the intention of extending future work to include models of
different modalities, thus providing a more comprehensive assessment of MLX's
capabilities. The results highlight MLX's potential in enabling efficient and
more accessible on-device ML applications within Apple's ecosystem.
Ссылки и действия
Дополнительные ресурсы: