Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
2510.16926v1
cs.CV, cs.CL
2025-10-22
Авторы:
Chenxu Li, Zhicai Wang, Yuan Sheng, Xingyu Zhu, Yanbin Hao, Xiang Wang
Abstract
Multimodal Large Language Models (MLLMs) increasingly support dynamic image
resolutions. However, current evaluation paradigms primarily assess semantic
performance, overlooking the critical question of resolution robustness -
whether performance remains stable across varying input resolutions. To address
this gap, we introduce \textbf{Res-Bench}, a comprehensive benchmark comprising
14,400 samples across 12 resolution levels and six core capability dimensions.
We designed a novel evaluation framework that goes beyond traditional accuracy
metrics to capture performance stability. This framework introduces multiple
robustness metrics: Spearman's correlation for assessing resolution-performance
trends, and Absolute/Relative Continuous Error (ACE/RCE) for measuring
performance volatility. Using these metrics, we conducted a large-scale
evaluation of leading MLLMs. Our analysis encompasses: (1) model-centric and
task-centric robustness examination, (2) investigation of preprocessing
strategies including padding and super-resolution, and (3) exploration of
fine-tuning for stability enhancement.
Ссылки и действия
Дополнительные ресурсы: