Notable agents robotics

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

Youwei Liu, Jian Wang, Hanlin Wang, Wenjie Li

Published: Jun 1, 2026 — 15:21 UTC

Problem
The paper addresses the limitations of existing textual world models, which are typically static post-training and fail to adapt to the dynamic state-action distributions generated by evolving agents. Additionally, traditional agent-improvement methods often depend on external rewards or verifiers, which restrict their effectiveness in realistic interactive settings. This work is presented as a preprint and has not undergone peer review.

Method
The authors propose COMAP, a framework that facilitates the co-evolution of textual world models and agent policies through a closed-loop interaction mechanism. At each decision-making step, the world model predicts future state feedback for potential actions. The agent then engages in future-aware reflection, assessing the reliability of this feedback and refining its actions accordingly. This iterative process generates on-policy trajectories that are utilized to update the world model via self-distillation, enabling it to align more closely with the agent’s evolving interaction distribution. The architecture leverages a combination of reinforcement learning techniques and self-supervised learning to enhance the adaptability of both the world model and the agent policy.

Results
COMAP demonstrates significant performance improvements across various benchmarks, including embodied task planning, web navigation, and tool-use tasks. Notably, it achieves a +16.75% relative improvement over the Qwen3-4B baseline. The results indicate that the co-evolutionary loop not only enhances the accuracy of the world model’s predictions over time but also contributes to more effective long-horizon decision-making. The paper provides detailed quantitative results, showcasing the framework’s robustness against competitive baselines.

Limitations
The authors acknowledge that the framework’s reliance on self-distillation may introduce challenges in environments with highly stochastic dynamics, where the accuracy of predictions can be compromised. Additionally, the paper does not address the computational overhead associated with the continuous updating of the world model, which may limit scalability in resource-constrained settings. The potential for overfitting during the self-distillation process is also a concern that warrants further investigation.

Why it matters
The implications of COMAP are significant for the development of more adaptive and intelligent language agents capable of operating in complex, interactive environments. By enabling agents to anticipate and evaluate actions based on evolving world models, this framework paves the way for advancements in autonomous decision-making systems. The findings contribute to the broader discourse on agent-based learning and world modeling, as discussed in related works on reinforcement learning and self-supervised learning methodologies, as published in arXiv cs.CL.

By Callan Zhang · Jun 1, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.CL