Notable agents robotics

SkillCAT: Contrastive Assessment and Topology-Aware Skill Self-Evolution for LLM Agents

Kunfeng Chen, Qihuang Zhong, Juhua Liu, Bo Du

Published
Jun 11, 2026 — 13:12 UTC

Problem
Current skill self-evolution methods for large language model (LLM) agents are limited by their reliance on single execution trajectories per task, which restricts the diversity of learned skills. Existing pipelines merge candidate skill patches without thorough evaluation and require loading the entire skill corpus during inference, leading to inefficiencies. This paper addresses these limitations by proposing a novel framework, SkillCAT, which separates the skill evolution process into distinct stages. Notably, this work is a preprint and has not undergone peer review.

Method
SkillCAT comprises three main components:

  1. Contrastive Causal Extraction (CCE): This stage samples multiple trajectories for each task and employs a contrastive approach to compare success and failure pairs within the same task. By analyzing these pairs, CCE identifies critical evidence that elucidates the differences in outcomes, thereby enhancing the understanding of skill effectiveness.
  2. Assessment-Augmented Evolution (AAE): In this phase, candidate skill patches are replayed on clones of the source task. Only those patches that either improve or maintain task performance are retained, ensuring that only effective skills are merged hierarchically.
  3. Topology-Aware Task Execution (TTE): This final stage organizes the evolved skills into a routable sub-skill topology, allowing the inference process to load only the relevant capability nodes for a given task, thus optimizing resource usage.

The authors do not disclose specific training compute requirements, as the framework is designed to be training-free.

Results
SkillCAT was evaluated on several established benchmarks, including SpreadsheetBench, WikiTableQuestions, and DocVQA. The framework demonstrated a significant performance improvement, achieving an average score increase of up to 40.40% over baseline methods. This enhancement was consistent across various settings, including cross-model and out-of-distribution generalization tests, indicating robust skill evolution capabilities without necessitating model retraining.

Limitations
The authors acknowledge that while SkillCAT effectively enhances skill evolution, it may still be constrained by the quality and diversity of the initial trajectory samples. Additionally, the reliance on task clones for assessment may limit the generalizability of the evolved skills to novel tasks not represented in the training data. The paper does not address potential scalability issues when applied to a larger number of tasks or more complex skill hierarchies.

Why it matters
The implications of SkillCAT are significant for the development of more efficient LLM agents capable of adaptive skill evolution. By decoupling the skill learning process from model training, this framework opens avenues for real-time skill adaptation and deployment in dynamic environments. The approach could lead to advancements in autonomous agents and interactive systems, enhancing their ability to learn from diverse experiences. This work contributes to the ongoing discourse in the field, as published in arXiv cs.CL.

Turing Wire

By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →

Source: arXiv cs.CL