Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
2510.20229v1
cs.CV, cs.AI, cs.CL
2025-10-25
Авторы:
Ge Zheng, Jiaye Qian, Jiajin Tang, Sibei Yang
Abstract
Large Vision-Language Models (LVLMs) have made significant progress in recent
years but are also prone to hallucination issues. They exhibit more
hallucinations in longer, free-form responses, often attributed to accumulated
uncertainties. In this paper, we ask: Does increased hallucination result
solely from length-induced errors, or is there a deeper underlying mechanism?
After a series of preliminary experiments and findings, we suggest that the
risk of hallucinations is not caused by length itself but by the increased
reliance on context for coherence and completeness in longer responses.
Building on these insights, we propose a novel "induce-detect-suppress"
framework that actively induces hallucinations through deliberately designed
contexts, leverages induced instances for early detection of high-risk cases,
and ultimately suppresses potential object-level hallucinations during actual
decoding. Our approach achieves consistent, significant improvements across all
benchmarks, demonstrating its efficacy. The strong detection and improved
hallucination mitigation not only validate our framework but, more importantly,
re-validate our hypothesis on context. Rather than solely pursuing performance
gains, this study aims to provide new insights and serves as a first step
toward a deeper exploration of hallucinations in LVLMs' longer responses.
Ссылки и действия
Дополнительные ресурсы: