MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning
2510.17590v1
cs.AI, cs.CL, cs.CV, cs.CY, cs.LG, I.2.7; H.3.3; I.4.9
2025-10-22
Авторы:
Mir Nafis Sharear Shopnil, Sharad Duwal, Abhishek Tyagi, Adiba Mahbub Proma
Abstract
Misinformation spreads across web platforms through billions of daily
multimodal posts that combine text and images, overwhelming manual
fact-checking capacity. Supervised detection models require domain-specific
training data and fail to generalize across diverse manipulation tactics. We
present MIRAGE, an inference-time, model-pluggable agentic framework that
decomposes multimodal verification into four sequential modules: visual
veracity assessment detects AI-generated images, cross-modal consistency
analysis identifies out-of-context repurposing, retrieval-augmented factual
checking grounds claims in web evidence through iterative question generation,
and a calibrated judgment module integrates all signals. MIRAGE orchestrates
vision-language model reasoning with targeted web retrieval, outputs structured
and citation-linked rationales. On MMFakeBench validation set (1,000 samples),
MIRAGE with GPT-4o-mini achieves 81.65% F1 and 75.1% accuracy, outperforming
the strongest zero-shot baseline (GPT-4V with MMD-Agent at 74.0% F1) by 7.65
points while maintaining 34.3% false positive rate versus 97.3% for a
judge-only baseline. Test set results (5,000 samples) confirm generalization
with 81.44% F1 and 75.08% accuracy. Ablation studies show visual verification
contributes 5.18 F1 points and retrieval-augmented reasoning contributes 2.97
points. Our results demonstrate that decomposed agentic reasoning with web
retrieval can match supervised detector performance without domain-specific
training, enabling misinformation detection across modalities where labeled
data remains scarce.