MMAO-Bench: MultiModal All in One Benchmark Reveals Compositional Law between Uni-modal and Omni-modal in OmniModels
2510.18915v1
cs.CL, cs.AI, I.2.7
2025-10-24
Авторы:
Chen Chen, ZeYang Hu, Fengjiao Chen, Liya Ma, Jiaxing Liu, Xiaoyu Li, Xuezhi Cao
Abstract
Multimodal Large Languages models have been progressing from uni-modal
understanding toward unifying visual, audio and language modalities,
collectively termed omni models. However, the correlation between uni-modal and
omni-modal remains unclear, which requires comprehensive evaluation to drive
omni model's intelligence evolution. In this work, we propose a novel, high
quality and diversity omni model benchmark, MultiModal All in One Benchmark
(MMAO-Bench), which effectively assesses both uni-modal and omni-modal
understanding capabilities. The benchmark consists of 1880 human curated
samples, across 44 task types, and a innovative multi-step open-ended question
type that better assess complex reasoning tasks. Experimental result shows the
compositional law between cross-modal and uni-modal performance and the
omni-modal capability manifests as a bottleneck effect on weak models, while
exhibiting synergistic promotion on strong models.
Ссылки и действия
Дополнительные ресурсы: