Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning
Benhao Huang, Zhengyang Geng, Zico Kolter
- Published
- May 20, 2026 — 17:59 UTC
Problem
This preprint addresses the gap in understanding the internal mechanisms that enable iterative models to generalize beyond memorized patterns during reasoning tasks. While scaling test-time compute through iterative updates has shown promise, the lack of clarity regarding how these models achieve generalization limits their applicability. The authors propose that generalizable reasoning can be achieved through learning task-conditioned attractors, which are latent dynamical systems with stable fixed points corresponding to valid solutions.
Method
The core technical contribution is the introduction of Equilibrium Reasoners (EqR), which formalize the concept of learning attractors to facilitate scalable reasoning. EqR operates by scaling internal dynamics in two dimensions: depth and breadth. Depth is increased by executing more iterations of the reasoning process, while breadth is enhanced by aggregating stochastic trajectories from multiple initializations. The architecture allows for unrolling up to 40,000 layers, enabling significant test-time scaling. The authors emphasize that the convergence of these models toward solution-aligned attractors is crucial for performance, with the ability to adaptively allocate compute resources based on task difficulty. The training compute specifics are not disclosed, but the method leverages iterative updates to optimize reasoning.
Results
The empirical results demonstrate substantial improvements in accuracy on the Sudoku-Extreme benchmark, where EqR achieves over 99% accuracy compared to just 2.6% for traditional feedforward models. This indicates an effect size that highlights the efficacy of the proposed method in scaling reasoning capabilities. The authors report that simpler tasks converge within 1 to 5 iterations, while more complex tasks benefit from extensive test-time scaling, reinforcing the utility of the attractor framework in enhancing model performance.
Limitations
The authors acknowledge that their approach may be limited by the complexity of the tasks and the potential for overfitting to specific attractor landscapes. They do not address the computational cost associated with unrolling a large number of layers, which may be prohibitive in resource-constrained environments. Additionally, the reliance on stochastic trajectories raises questions about the reproducibility of results across different runs. The paper does not explore the generalization of the attractor learning mechanism to other domains beyond the tested benchmarks.
Why it matters
This work has significant implications for the development of scalable reasoning systems in AI. By framing reasoning as a process of converging to learned attractors, the authors provide a novel perspective that could inform future architectures and training methodologies. The ability to adaptively allocate compute resources based on task difficulty could lead to more efficient models capable of tackling a wider range of complex problems. This research opens avenues for further exploration into the dynamics of latent spaces and their role in enhancing generalization in iterative models.
Authors: Benhao Huang, Zhengyang Geng, Zico Kolter
Source: arXiv:2605.21488
URL: https://arxiv.org/abs/2605.21488v1
By Turing Wire editorial staff · May 20, 2026 · Editorial standards →
Source: arXiv cs.LG