Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents
Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang
- Published
- Jun 11, 2026 — 10:43 UTC
Problem — The paper addresses the persistent issue of interactive LLM agents failing to retain user preferences across sessions, leading to repeated violations of user corrections. Despite existing memory mechanisms, such as Mem0, which still leave 57.5% of applicable preference checks unaddressed, there is a lack of effective solutions that allow coding agents to learn from user interactions in real-time. This work is a preprint and has not undergone peer review.
Method — The authors propose Test-time Rule Acquisition and Compiled Enforcement (TRACE), a novel skill-layer pipeline designed for coding-agent runtimes. TRACE operates by mining user corrections during interactions, converting these corrections into atomic rules, and compiling them into runtime checks that must be satisfied before an agent can complete future tasks. This approach contrasts with traditional methods where rules are predefined by developers. The evaluation of TRACE is conducted through simulated user-in-the-loop experiments on two task sets: ClawArena, which focuses on coding-agent tasks, and MemoryArena, which emphasizes memory-intensive tasks.
Results — TRACE demonstrates significant improvements in reducing preference violations. In ClawArena, the method reduces held-out preference violations from 100.0% to 37.6% on in-distribution tasks and from 100.0% to 2.0% on out-of-distribution tasks. For MemoryArena-derived tasks, TRACE achieves a reduction in in-distribution violations from 100.0% to 60.5%, while also matching or surpassing the strongest memory baseline in terms of task pass rates. These results indicate that TRACE effectively mitigates a critical failure mode associated with memory alone, enhancing user-agent interaction.
Limitations — The authors acknowledge that while TRACE significantly reduces preference violations, it may not completely eliminate them, particularly in highly complex or nuanced user preferences. Additionally, the reliance on user corrections for rule generation may introduce biases based on the specificity and frequency of user interactions. The paper does not address the scalability of TRACE in environments with diverse user bases or the potential computational overhead introduced by real-time rule compilation.
Why it matters — The implications of this work are substantial for the development of more adaptive and user-friendly coding agents. By enabling agents to learn from user corrections dynamically, TRACE could enhance user satisfaction and efficiency in collaborative coding environments. This research contributes to the broader discourse on improving human-AI interaction and the practical deployment of LLMs in real-world applications, as published in arXiv cs.CL.
By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →
Source: arXiv cs.CL