From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
2510.19654v1
cs.CV, cs.AI, cs.CL, cs.RO
2025-10-24
Авторы:
Zhida Zhao, Talas Fu, Yifan Wang, Lijun Wang, Huchuan Lu
Abstract
Despite remarkable progress in driving world models, their potential for
autonomous systems remains largely untapped: the world models are mostly
learned for world simulation and decoupled from trajectory planning. While
recent efforts aim to unify world modeling and planning in a single framework,
the synergistic facilitation mechanism of world modeling for planning still
requires further exploration. In this work, we introduce a new driving paradigm
named Policy World Model (PWM), which not only integrates world modeling and
trajectory planning within a unified architecture, but is also able to benefit
planning using the learned world knowledge through the proposed action-free
future state forecasting scheme. Through collaborative state-action prediction,
PWM can mimic the human-like anticipatory perception, yielding more reliable
planning performance. To facilitate the efficiency of video forecasting, we
further introduce a dynamically enhanced parallel token generation mechanism,
equipped with a context-guided tokenizer and an adaptive dynamic focal loss.
Despite utilizing only front camera input, our method matches or exceeds
state-of-the-art approaches that rely on multi-view and multi-modal inputs.
Code and model weights will be released at
https://github.com/6550Zhao/Policy-World-Model.