Notable efficiency inference

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

Yu Xia, Zhouhang Xie, Xin Xu, Byungkyu Kang, Prarit Lamba, Xiang Gao

Published
Jun 2, 2026 — 17:51 UTC

Problem
This work addresses the inefficiency in token usage and lack of inference-time control in large language models (LLMs) during chain-of-thought reasoning. Existing methods for efficient reasoning, such as early stopping and trace compression, do not explicitly control how the model reasons, leading to suboptimal performance. The authors propose a novel approach, Agentic Chain-of-Thought Steering (ACTS), to fill this gap by providing a structured mechanism for steering reasoning processes. This paper is a preprint and has not yet undergone peer review.

Method
ACTS formulates the reasoning process as a Markov decision process (MDP), where a controller agent interacts with a frozen reasoner. The controller observes the current reasoning trace and the remaining token budget, issuing steering actions that consist of a reasoning strategy and a steering phrase to guide the next step of the reasoner. The controller is initialized using synthetic steering trajectories generated through multi-budget augmentation and is further refined via reinforcement learning, utilizing a budget-conditioned reward shaping mechanism. This approach allows for adaptive control over reasoning length and strategy, optimizing for both accuracy and efficiency.

Results
The authors conducted experiments across multiple benchmarks, demonstrating that ACTS achieves performance comparable to full-thinking models while significantly reducing token usage. For instance, on the MATH benchmark, ACTS achieved a 15% reduction in tokens while maintaining accuracy levels similar to the baseline models. Additionally, ACTS allows for controllable trade-offs between accuracy and efficiency, enabling users to adjust the reasoning process based on specific task requirements. The results indicate that ACTS can effectively balance the need for thorough reasoning with the constraints of computational resources.

Limitations
The authors acknowledge several limitations in their work. The reliance on synthetic steering trajectories may not fully capture the complexities of real-world reasoning tasks, potentially limiting generalizability. Additionally, the performance of ACTS may vary depending on the specific architecture of the reasoner used, which could affect its applicability across different LLMs. The authors also note that while the controller can adaptively steer reasoning, it may still struggle with highly complex tasks that require extensive reasoning beyond the budget constraints.

Why it matters
The implications of this work are significant for the development of more efficient LLMs capable of controlled reasoning. By enabling budget-aware reasoning strategies, ACTS can enhance the usability of LLMs in resource-constrained environments, making them more accessible for real-time applications. This research contributes to the ongoing discourse on optimizing LLM performance while managing computational costs, as discussed in related works on efficient reasoning strategies. The findings are relevant for future research in adaptive reasoning and model efficiency, as published in arXiv cs.AI.

Turing Wire

By Turing Wire editorial staff · Jun 2, 2026 · Editorial standards →

Source: arXiv cs.AI