Conflict Adaptation in Vision-Language Models

2510.24804v1 cs.CV, cs.CL 2025-10-31

Авторы:

Xiaoyang Hu

Abstract

A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Conflict Adaptation in Vision-Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality...

Generalized Medical Phrase Grounding

CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on...

Thinking with Programming Vision: Towards a Unified View for Thinking with Image...

See, Think, Learn: A Self-Taught Multimodal Reasoner

Навигация