Benchmarking Multimodal Large Language Models for Face Recognition

2510.14866v1 cs.CV, cs.AI, cs.CL 2025-10-18

Авторы:

Hatef Otroshi Shahreza, Sébastien Marcel

Abstract

Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization. The source code of our benchmark is publicly available in the project page.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Benchmarking Multimodal Large Language Models for Face Recognition

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Traini...

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Model...

Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Stream...

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcem...

Навигация