Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation
2510.08553v1
cs.CV, cs.AI, cs.RO
2025-10-11
Авторы:
Yunzhe Xu, Yiyuan Pan, Zhe Liu
Abstract
Vision-and-Language Navigation (VLN) requires agents to follow natural
language instructions through environments, with memory-persistent variants
demanding progressive improvement through accumulated experience. Existing
approaches for memory-persistent VLN face critical limitations: they lack
effective memory access mechanisms, instead relying on entire memory
incorporation or fixed-horizon lookup, and predominantly store only
environmental observations while neglecting navigation behavioral patterns that
encode valuable decision-making strategies. We present Memoir, which employs
imagination as a retrieval mechanism grounded by explicit memory: a world model
imagines future navigation states as queries to selectively retrieve relevant
environmental observations and behavioral histories. The approach comprises: 1)
a language-conditioned world model that imagines future states serving dual
purposes: encoding experiences for storage and generating retrieval queries; 2)
Hybrid Viewpoint-Level Memory that anchors both observations and behavioral
patterns to viewpoints, enabling hybrid retrieval; and 3) an
experience-augmented navigation model that integrates retrieved knowledge
through specialized encoders. Extensive evaluation across diverse
memory-persistent VLN benchmarks with 10 distinctive testing scenarios
demonstrates Memoir's effectiveness: significant improvements across all
scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent
baseline, accompanied by 8.3x training speedup and 74% inference memory
reduction. The results validate that predictive retrieval of both environmental
and behavioral memories enables more effective navigation, with analysis
indicating substantial headroom (73.3% vs 93.4% upper bound) for this
imagination-guided paradigm. Code at https://github.com/xyz9911/Memoir.
Ссылки и действия
Дополнительные ресурсы: