Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report

2510.07092v1 cs.LG, cs.AI, cs.RO 2025-10-10

Авторы:

Riccardo Mereu, Aidan Scannell, Yuxin Hou, Yi Zhao, Aditya Jitta, Antonio Dominguez, Luigi Acerbi, Amos Storkey, Paul Chang

Abstract

World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Leveraging LLMs for reward function design in reinforcement learning control tas...

Are LLMs The Way Forward? A Case Study on LLM-Guided Reinforcement Learning for ...

Harnessing Bounded-Support Evolution Strategies for Policy Refinement

Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Mod...

Навигация