HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang, Xianghe Pang, Shuo Tang
- Published
- Jun 11, 2026 — 17:56 UTC
Problem
Current tool-augmented large language model (LLM) agents typically rely on step-wise atomic tool calls, leading to an execution-granularity mismatch. This mismatch arises because locally deterministic workflows are decomposed into multiple model-visible decisions, which consume context and require the model to manage low-level data flow. The authors identify this limitation in existing methodologies and propose a solution to streamline tool interactions. This work is presented as a preprint, indicating it has not yet undergone peer review.
Method
The core contribution of this paper is the introduction of HyperTool, a unified executable interface that allows LLMs to invoke tools using a single code block. This interface enables the model to call existing tools through their original schemas, manipulate returned values, and pass intermediate results locally, effectively folding multiple deterministic tool subroutines into a single outer call. To train models on this new interface, the authors synthesize HyperTool-format trajectories from cross-tool compositional tasks and validate them in real multi-tool environments (MCP). The training process leverages a combination of synthetic data generation and real-world task execution to ensure robustness and applicability.
Results
HyperTool demonstrates significant improvements in accuracy on the MCP-Universe benchmark. Specifically, the average accuracy for Qwen3-32B increases from 15.69% to 35.29%, while for Qwen3-8B, it rises from 9.93% to 33.33%. Additionally, HyperTool outperforms established baselines, including GPT-OSS and Kimi-k2.5, in terms of average accuracy, indicating its effectiveness in enhancing multi-step tool usage.
Limitations
The authors acknowledge that the reliance on synthetic data for training may limit the generalizability of the model in real-world scenarios. They also note that the performance gains may vary depending on the complexity of the tasks and the specific tools involved. Furthermore, the paper does not address potential scalability issues when integrating HyperTool with a larger variety of tools or in more complex environments.
Why it matters
The introduction of HyperTool has significant implications for the development of more efficient tool-augmented agents, as it addresses the execution-granularity mismatch that hampers current methodologies. By allowing LLMs to manage tool interactions more effectively, HyperTool can enhance the performance of agents in complex tasks requiring multiple tool calls. This advancement could lead to broader applications in areas such as automated reasoning, decision-making systems, and interactive AI agents, as discussed in related works on tool-augmented learning and multi-agent systems, as published in arXiv.
By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →
Source: arXiv cs.CL