StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang
- Published
- May 7, 2026 — 17:51 UTC
- Summary length
- 443 words
- Relevance score
- 80%
Problem
This paper addresses the limitations of current reinforcement learning (RL) methods in optimizing large language models (LLMs) for long-horizon decision-making tasks. Existing approaches are predominantly reactive, which hampers effective exploration and complicates credit assignment across extended trajectories. The authors propose a novel framework, Strategic Trajectory Abstraction (StraTA), to fill this gap by introducing a structured strategy at the trajectory level. This work is presented as a preprint and has not yet undergone peer review.
Method
The core technical contribution of StraTA is its hierarchical framework that integrates trajectory-level strategy generation with action execution. The method involves sampling a compact strategy from the initial task state, which conditions subsequent actions. The authors employ a GRPO-style rollout design that facilitates joint training of strategy generation and action execution. Key components include diverse strategy rollouts and a mechanism for critical self-judgment, which enhances the agent’s ability to evaluate its own performance. The training process is not explicitly detailed in terms of compute resources, but the architecture leverages hierarchical reinforcement learning principles to improve both sample efficiency and performance.
Results
The experimental evaluation of StraTA demonstrates significant improvements over strong baselines across multiple benchmarks. On ALFWorld, StraTA achieves a success rate of 93.1%, while on WebShop, it reaches 84.2%. In the SciWorld environment, the framework attains an overall score of 63.5%, outperforming leading closed-source models. These results indicate a marked enhancement in both sample efficiency and final performance metrics, suggesting that the introduction of strategic trajectory abstraction effectively addresses the challenges of long-horizon decision-making in RL.
Limitations
The authors acknowledge several limitations, including the potential for overfitting due to the compact strategy representation and the reliance on the quality of the initial task state for strategy sampling. They also note that the framework may require extensive tuning of hyperparameters to achieve optimal performance across different environments. An additional limitation not explicitly mentioned is the scalability of the approach to more complex or dynamic environments, which may challenge the effectiveness of the trajectory abstraction.
Why it matters
The implications of this work are significant for the field of reinforcement learning, particularly in the context of training LLMs as interactive agents. By introducing a structured approach to strategy abstraction, StraTA paves the way for more effective exploration and credit assignment in long-horizon tasks. This could lead to advancements in various applications, including robotics, game playing, and interactive AI systems, where decision-making over extended periods is crucial. The framework’s ability to enhance sample efficiency and performance metrics also suggests potential for broader applicability in RL research and development.
Authors: Xiangyuan Xue, Yifan Zhou, Zidong Wang, Shengji Tang, Philip Torr, Wanli Ouyang, Lei Bai, Zhenfei Yin
Source: arXiv:2605.06642
URL: https://arxiv.org/abs/2605.06642v1