A Study on Inference Latency for Vision Transformers on Mobile Devices

2510.25166v1 cs.CV, cs.LG, cs.PF 2025-10-31

Авторы:

Zhuojin Li, Marco Paolieri, Leana Golubchik

Abstract

Given the significant advances in machine learning techniques on mobile devices, particularly in the domain of computer vision, in this work we quantitatively study the performance characteristics of 190 real-world vision transformers (ViTs) on mobile devices. Through a comparison with 102 real-world convolutional neural networks (CNNs), we provide insights into the factors that influence the latency of ViT architectures on mobile devices. Based on these insights, we develop a dataset including measured latencies of 1000 synthetic ViTs with representative building blocks and state-of-the-art architectures from two machine learning frameworks and six mobile platforms. Using this dataset, we show that inference latency of new ViTs can be predicted with sufficient accuracy for real-world applications.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

A Study on Inference Latency for Vision Transformers on Mobile Devices

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Using MLIR Transform to Design Sliced Convolution Algorithm

CRAM: Large-scale Video Continual Learning with Bootstrapped Compression

Навигация