Exploring the Hierarchical Reasoning Model for Small Natural-Image Classification Without Augmentation
2510.03598v1
cs.CV, cs.LG
2025-10-08
Авторы:
Alexander V. Mantzaris
Abstract
This paper asks whether the Hierarchical Reasoning Model (HRM) with the two
Transformer-style modules $(f_L,f_H)$, one step (DEQ-style) training, deep
supervision, Rotary Position Embeddings, and RMSNorm can serve as a practical
image classifier. It is evaluated on MNIST, CIFAR-10, and CIFAR-100 under a
deliberately raw regime: no data augmentation, identical optimizer family with
one-epoch warmup then cosine-floor decay, and label smoothing. HRM optimizes
stably and performs well on MNIST ($\approx 98\%$ test accuracy), but on small
natural images it overfits and generalizes poorly: on CIFAR-10, HRM reaches
65.0\% after 25 epochs, whereas a two-stage Conv--BN--ReLU baseline attains
77.2\% while training $\sim 30\times$ faster per epoch; on CIFAR-100, HRM
achieves only 29.7\% test accuracy despite 91.5\% train accuracy, while the
same CNN reaches 45.3\% test with 50.5\% train accuracy. Loss traces and error
analyses indicate healthy optimization but insufficient image-specific
inductive bias for HRM in this regime. It is concluded that, for
small-resolution image classification without augmentation, HRM is not
competitive with even simple convolutional architectures as the HRM currently
exist but this does not exclude possibilities that modifications to the model
may allow it to improve greatly.
Ссылки и действия
Дополнительные ресурсы: