Notable reasoning Cognition

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Lukas Aichberger, Sepp Hochreiter

Published
May 28, 2026 — 17:59 UTC

Problem
This paper addresses the limitations of current large language models (LLMs) in reasoning tasks, particularly the inefficiencies associated with autoregressive generation during test-time computation. The authors argue that this approach conflates internal reasoning processes with external communication, which is not reflective of human cognitive processes. They propose a novel method, Reasoning in Memory (RiM), to enhance the reasoning capabilities of LLMs by leveraging a working memory mechanism. This work is presented as a preprint and has not yet undergone peer review.

Method
The core technical contribution of this paper is the introduction of the RiM framework, which utilizes fixed sequences of special tokens as memory blocks to facilitate latent reasoning. Unlike traditional autoregressive methods, RiM allows for the processing of these memory blocks in a single forward pass, significantly improving computational efficiency. The authors implement a two-stage curriculum for training:

  1. Grounding Phase: In this phase, the model is trained to predict explicit reasoning steps after each memory block, providing a form of supervision.
  2. Refinement Phase: The model then discards the step-level supervision and iteratively refines its final answer based on the information held in the memory blocks.

The architecture remains consistent with existing LLMs, but the introduction of memory blocks is a key innovation that enables the model to utilize working memory effectively.

Results
The authors evaluate RiM against various reasoning benchmarks, demonstrating that it matches or surpasses the performance of existing latent reasoning methods across different language model families and sizes. Specific performance metrics are not disclosed in the abstract, but the results indicate a significant improvement in reasoning efficiency and accuracy compared to traditional autoregressive approaches. The paper suggests that RiM can achieve comparable or superior results while avoiding the computational overhead associated with generating intermediate reasoning steps.

Limitations
The authors acknowledge that the effectiveness of RiM may depend on the specific architecture of the underlying language model and the nature of the reasoning tasks. They do not address potential limitations related to the scalability of the memory block approach or the generalizability of the method across diverse domains. Additionally, the reliance on fixed token sequences may limit flexibility in handling more complex reasoning scenarios that require dynamic adjustments.

Why it matters
The introduction of RiM has significant implications for the development of more efficient reasoning mechanisms in LLMs. By decoupling internal reasoning from external communication, this approach aligns more closely with human cognitive processes, potentially leading to more robust and interpretable AI systems. The ability to utilize working memory effectively could pave the way for advancements in various applications, including natural language understanding, decision-making systems, and interactive AI agents. This work encourages further exploration of memory-based architectures in LLMs, which could enhance their reasoning capabilities and overall performance.

Authors: Lukas Aichberger, Sepp Hochreiter
Source: arXiv:2605.30343
URL: https://arxiv.org/abs/2605.30343v1

Turing Wire

By Turing Wire editorial staff · May 28, 2026 · Editorial standards →

Source: arXiv cs.AI