Multi-Scale High-Resolution Logarithmic Grapher Module for Efficient Vision GNNs
2510.13740v1
cs.CV, cs.AI, cs.LG
2025-10-17
Авторы:
Mustafa Munir, Alex Zhang, Radu Marculescu
Abstract
Vision graph neural networks (ViG) have demonstrated promise in vision tasks
as a competitive alternative to conventional convolutional neural nets (CNN)
and transformers (ViTs); however, common graph construction methods, such as
k-nearest neighbor (KNN), can be expensive on larger images. While methods such
as Sparse Vision Graph Attention (SVGA) have shown promise, SVGA's fixed step
scale can lead to over-squashing and missing multiple connections to gain the
same information that could be gained from a long-range link. Through this
observation, we propose a new graph construction method, Logarithmic Scalable
Graph Construction (LSGC) to enhance performance by limiting the number of
long-range links. To this end, we propose LogViG, a novel hybrid CNN-GNN model
that utilizes LSGC. Furthermore, inspired by the successes of multi-scale and
high-resolution architectures, we introduce and apply a high-resolution branch
and fuse features between our high-resolution and low-resolution branches for a
multi-scale high-resolution Vision GNN network. Extensive experiments show that
LogViG beats existing ViG, CNN, and ViT architectures in terms of accuracy,
GMACs, and parameters on image classification and semantic segmentation tasks.
Our smallest model, Ti-LogViG, achieves an average top-1 accuracy on
ImageNet-1K of 79.9% with a standard deviation of 0.2%, 1.7% higher average
accuracy than Vision GNN with a 24.3% reduction in parameters and 35.3%
reduction in GMACs. Our work shows that leveraging long-range links in graph
construction for ViGs through our proposed LSGC can exceed the performance of
current state-of-the-art ViGs. Code is available at
https://github.com/mmunir127/LogViG-Official.
Ссылки и действия
Дополнительные ресурсы: