A Study on Inference Latency for Vision Transformers on Mobile Devices
2510.25166v1
cs.CV, cs.LG, cs.PF
2025-10-31
Авторы:
Zhuojin Li, Marco Paolieri, Leana Golubchik
Abstract
Given the significant advances in machine learning techniques on mobile
devices, particularly in the domain of computer vision, in this work we
quantitatively study the performance characteristics of 190 real-world vision
transformers (ViTs) on mobile devices. Through a comparison with 102 real-world
convolutional neural networks (CNNs), we provide insights into the factors that
influence the latency of ViT architectures on mobile devices. Based on these
insights, we develop a dataset including measured latencies of 1000 synthetic
ViTs with representative building blocks and state-of-the-art architectures
from two machine learning frameworks and six mobile platforms. Using this
dataset, we show that inference latency of new ViTs can be predicted with
sufficient accuracy for real-world applications.
Ссылки и действия
Дополнительные ресурсы: