Engineering Robustness into Personal Agents with the AI Workflow Store
Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari, Lillian Tsai, Wen Zhang
- Published
- May 11, 2026 — 17:46 UTC
- Summary length
- 456 words
- Relevance score
- 70%
Problem
This paper addresses the gap in the current paradigm of AI agents, which predominantly relies on “on-the-fly” synthesis of plans and actions in response to user prompts. The authors argue that this approach undermines established software engineering (SE) practices, such as iterative design, rigorous testing, and adversarial evaluation, which are essential for creating reliable and secure systems. The work is presented as a preprint and has not undergone peer review, highlighting the need for a more disciplined approach to developing AI agents that can be safely deployed in high-stakes scenarios.
Method
The authors propose the concept of an AI Workflow Store, which consists of hardened and reusable workflows that AI agents can invoke. This approach aims to integrate rigorous SE processes into the agentic loop, thereby enhancing the reliability and security of AI systems. The paper outlines several research challenges associated with this vision, particularly the flexibility-robustness tension that arises when moving away from the rapid synthesis paradigm. While specific architectural details, loss functions, or training compute requirements are not disclosed, the emphasis is on the need for a structured framework that allows for the reuse of workflows across a broad user community, potentially requiring additional computational resources and time.
Results
The paper does not present empirical results or quantitative comparisons against named baselines or benchmarks, as it primarily focuses on conceptualizing the AI Workflow Store and the associated challenges. The authors emphasize the theoretical benefits of integrating SE practices into AI workflows, suggesting that such an approach could lead to significantly more robust and secure AI agents compared to those relying on improvised, real-time synthesis.
Limitations
The authors acknowledge that implementing the proposed AI Workflow Store may incur additional computational costs and time, which could be a barrier to adoption. They do not provide specific metrics or case studies to validate their claims, leaving a gap in empirical evidence to support the theoretical framework. Additionally, the paper does not address potential scalability issues or the complexity of maintaining a repository of workflows that can accommodate diverse user needs and scenarios.
Why it matters
This work has significant implications for the development of AI agents, particularly in contexts where reliability and security are paramount. By advocating for the integration of rigorous SE processes, the authors highlight the necessity of moving beyond the current paradigm of rapid synthesis to create production-grade systems. The proposed AI Workflow Store could serve as a foundational framework for future research, enabling the development of more robust AI agents that can be safely deployed in high-stakes environments. This shift could ultimately lead to greater trust in AI systems and their applications across various domains.
Authors: Roxana Geambasu, Mariana Raykova, Pierre Tholoniat, Trishita Tiwari, Lillian Tsai, Wen Zhang
Source: arXiv:2605.10907
URL: https://arxiv.org/abs/2605.10907v1