CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant, Noah D. Goodman, Judith E. Fan
- Published
- May 27, 2026 — 17:01 UTC
Problem
This preprint addresses the inefficiencies in existing methods for improving reasoning capabilities in language models, specifically highlighting the high resource requirements of both parametric (e.g., Reinforcement Learning with Verifiable Rewards, RLVR) and non-parametric (e.g., prompt optimization) approaches. These methods typically necessitate extensive training samples and numerous model rollouts, which can be prohibitively expensive or infeasible. The authors propose a novel approach, CORE (Contrastive Reflection), to mitigate these challenges by leveraging past reasoning traces to enhance model performance with fewer resources.
Method
CORE is a non-parametric learning algorithm that utilizes a contrastive approach to analyze past reasoning attempts. It generates concise natural-language descriptions that encapsulate the reasoning strategies and constraints distinguishing successful from unsuccessful attempts. This method allows for the distillation of insights from reasoning traces without the need for extensive model retraining or prompt adjustments. The authors demonstrate that CORE can operate effectively with as few as five training samples and a fixed rollout budget, achieving significant performance improvements compared to existing methods. The architecture is designed to be context-efficient, requiring fewer prompt tokens while maintaining interpretability through the compact representation of learned knowledge.
Results
The authors conducted experiments across four reasoning tasks, comparing CORE against several baselines: GRPO (a parametric method), GEPA, episodic RAG, and MemRL (both non-parametric). CORE outperformed these methods in terms of rapid improvement and efficiency, achieving comparable or superior performance gains with fewer rollouts. Specific performance metrics were not disclosed in the abstract, but the results indicate that CORE’s approach to distilling contrasts between reasoning traces leads to more effective self-improvement strategies than traditional weight updates or prompt optimization techniques.
Limitations
The authors acknowledge that while CORE demonstrates significant improvements, it may still be limited by the quality and diversity of the reasoning traces it analyzes. Additionally, the reliance on natural-language descriptions for insight generation could introduce variability based on the model’s language understanding capabilities. The paper does not address potential scalability issues when applied to more complex reasoning tasks or larger datasets, nor does it explore the implications of using CORE in real-world applications where reasoning tasks may not be as structured.
Why it matters
The implications of CORE are substantial for the field of AI and language modeling. By providing a more efficient and interpretable method for model self-improvement, CORE could facilitate advancements in reasoning capabilities without the prohibitive costs associated with traditional methods. This approach may pave the way for more accessible and scalable AI systems that can adapt and learn from their experiences in a more human-like manner. Furthermore, the insights generated by CORE could enhance the interpretability of AI decision-making processes, which is crucial for trust and accountability in AI applications.
Authors: Linas Nasvytis, Simon Jerome Han, Ben Prystawski, Satchel Grant, Noah D. Goodman, Judith E. Fan
Source: arXiv:2605.28742
URL: https://arxiv.org/abs/2605.28742v1
By Turing Wire editorial staff · May 27, 2026 · Editorial standards →
Source: arXiv cs.AI