GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
2509.25178v1
cs.CV, cs.AI, cs.LG
2025-10-02
Авторы:
Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh, Arshia Soltani Moakhar, Basim Azam, Soheil Feizi, Naveed Akhtar
Abstract
Object hallucination in Multimodal Large Language Models (MLLMs) is a
persistent failure mode that causes the model to perceive objects absent in the
image. This weakness of MLLMs is currently studied using static benchmarks with
fixed visual scenarios, which preempts the possibility of uncovering
model-specific or unanticipated hallucination vulnerabilities. We introduce
GHOST (Generating Hallucinations via Optimizing Stealth Tokens), a method
designed to stress-test MLLMs by actively generating images that induce
hallucination. GHOST is fully automatic and requires no human supervision or
prior knowledge. It operates by optimizing in the image embedding space to
mislead the model while keeping the target object absent, and then guiding a
diffusion model conditioned on the embedding to generate natural-looking
images. The resulting images remain visually natural and close to the original
input, yet introduce subtle misleading cues that cause the model to
hallucinate. We evaluate our method across a range of models, including
reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success
rate exceeding 28%, compared to around 1% in prior data-driven discovery
methods. We confirm that the generated images are both high-quality and
object-free through quantitative metrics and human evaluation. Also, GHOST
uncovers transferable vulnerabilities: images optimized for Qwen2.5-VL induce
hallucinations in GPT-4o at a 66.5% rate. Finally, we show that fine-tuning on
our images mitigates hallucination, positioning GHOST as both a diagnostic and
corrective tool for building more reliable multimodal systems.
Ссылки и действия
Дополнительные ресурсы: