Geo-R1: Unlocking VLM Geospatial Reasoning with Cross-View Reinforcement Learning
2510.00072v1
cs.CV, cs.AI, cs.LG
2025-10-04
Авторы:
Chenhui Xu, Fuxun Yu, Michael J. Bianco, Jacob Kovarskiy, Raphael Tang, Qi Zhang, Zirui Xu, Will LeVine, Brandon Dubbs, Heming Liao, Cassandra Burgess, Suvam Bag, Jay Patravali, Rupanjali Kukal, Mikael Figueroa, Rishi Madhok, Nikolaos Karianakis, Jinjun Xiong
Abstract
We introduce Geo-R1, a reasoning-centric post-training framework that unlocks
geospatial reasoning in vision-language models by combining thinking
scaffolding and elevating. In the scaffolding stage, Geo-R1 instills a
``geospatial thinking paradigm" via supervised fine-tuning on synthetic
chain-of-thought exemplars, enabling models to connect visual cues with
geographic priors without costly human reasoning annotations. In the elevating
stage, it uses GRPO-based reinforcement learning on a weakly-supervised
cross-view pairing proxy. This design supplies a verifiable and scalable reward
signal: teaching models to capture and reconcile features across modalities,
and harnessing reasoning for accurate prediction. Geo-R1 extends geospatial
modeling from domain pretraining / supervised finetuning to reasoning-first
post-training, and achieves state-of-the-art performance across various
geospatial reasoning benchmarks. Our model is available at
https://huggingface.co/miniHui/Geo-R1.
Ссылки и действия
Дополнительные ресурсы: