Benchmarking Multimodal Large Language Models for Face Recognition
2510.14866v1
cs.CV, cs.AI, cs.CL
2025-10-18
Авторы:
Hatef Otroshi Shahreza, Sébastien Marcel
Abstract
Multimodal large language models (MLLMs) have achieved remarkable performance
across diverse vision-and-language tasks. However, their potential in face
recognition remains underexplored. In particular, the performance of
open-source MLLMs needs to be evaluated and compared with existing face
recognition models on standard benchmarks with similar protocol. In this work,
we present a systematic benchmark of state-of-the-art MLLMs for face
recognition on several face recognition datasets, including LFW, CALFW, CPLFW,
CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich
semantic cues useful for face-related tasks, they lag behind specialized models
in high-precision recognition scenarios in zero-shot applications. This
benchmark provides a foundation for advancing MLLM-based face recognition,
offering insights for the design of next-generation models with higher accuracy
and generalization. The source code of our benchmark is publicly available in
the project page.
Ссылки и действия
Дополнительные ресурсы: