GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards
Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha
- Published
- May 19, 2026 — 15:37 UTC
Problem
This paper addresses the gap in geospatial reasoning capabilities, particularly the challenge of developing models that can effectively solve image-grounded spatial problems without relying on extensive human-annotated datasets. The authors highlight the combinatorial nature of the question space in geospatial tasks, which complicates the annotation process. The work is presented as a preprint and has not yet undergone peer review.
Method
The authors introduce GeoX, a self-play framework that leverages reinforcement learning to acquire spatial logic through executable programs. The architecture employs a single multimodal policy that generates spatial reasoning problems as executable programs, which are then solved using three reasoning modes: abduction, deduction, and induction. The framework integrates an image understanding tool to process satellite or aerial images and a verifier that executes the generated programs to provide verifiable rewards. This dual-role optimization allows the model to learn effectively from the reward signals generated during self-play. The training process is designed to enhance the model’s performance on geospatial reasoning tasks without the need for large-scale human-curated datasets.
Results
GeoX demonstrates significant improvements over baseline models, achieving an average increase of up to 5.5 points on relevant benchmarks. The authors compare their results against conventional baselines that have been trained on millions of curated data points, showing that GeoX can match or exceed these models in performance. Specific benchmark metrics and comparisons are not detailed in the abstract, but the results indicate a robust capability in geospatial reasoning tasks.
Limitations
The authors acknowledge that the reliance on self-play may introduce biases based on the initial conditions and the design of the executable programs. Additionally, while the framework shows promise, it may not generalize well to all types of geospatial reasoning tasks, particularly those that require nuanced human-like understanding or contextual knowledge not captured in the training data. The paper does not discuss the computational resources required for training or the scalability of the approach in real-world applications.
Why it matters
The implications of this work are significant for advancing geospatial reasoning in AI. By demonstrating that self-play and verifiable rewards can effectively substitute for large annotated datasets, GeoX opens new avenues for research in unsupervised and semi-supervised learning paradigms. This approach could lead to more efficient training methodologies for models in various domains, including autonomous navigation, environmental monitoring, and urban planning. Furthermore, the release of a benchmark for geospatial understanding accumulated through self-play provides a valuable resource for future research in this area.
Authors: Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha
Source: arXiv:2605.20006
URL: https://arxiv.org/abs/2605.20006v1
By Turing Wire editorial staff · May 19, 2026 · Editorial standards →
Source: arXiv cs.AI