MemRefine: LLM-Guided Compression for Long-Term Agent Memory
Minjae Kim, Jinheon Baek, Soyeong Jeong, Sung Ju Hwang
- Published
- Jun 11, 2026 — 10:46 UTC
Problem
The paper addresses the challenge of managing memory in large language model (LLM) agents during long-term interactions, where the accumulation of dialogue data leads to an unbounded memory store filled with redundant entries. This issue is particularly critical for resource-constrained platforms with strict memory budgets. The authors highlight the need for a systematic approach to storage-budgeted memory management, which is currently underexplored in the literature, especially in the context of LLMs. This work is presented as a preprint and has not yet undergone peer review.
Method
The proposed method, MemRefine, employs an LLM-guided framework for memory management. It operates by first identifying candidate memory entries for deletion or merging based on surface similarity. However, rather than relying solely on similarity metrics, MemRefine defers the final decision-making to an LLM that evaluates the factual content of the candidates. The process iterates until the memory store meets the specified budget constraints. The authors do not disclose specific architectural details of the LLM used, nor do they provide explicit training compute metrics. The framework is tested across various memory architectures and long-term conversation benchmarks, demonstrating its versatility.
Results
MemRefine consistently achieves target memory budgets while maintaining downstream performance across multiple benchmarks. The authors report that it outperforms rule-based baselines, particularly under tight memory constraints. For instance, in experiments with a fixed memory budget, MemRefine maintained a 15% higher accuracy on downstream tasks compared to traditional methods, demonstrating its effectiveness in preserving relevant information while managing memory efficiently.
Limitations
The authors acknowledge that the reliance on LLMs for decision-making may introduce variability based on the specific LLM architecture employed. Additionally, the iterative nature of the process could lead to increased computational overhead, which may not be suitable for all applications. They also note that the framework’s performance may vary with different types of memory architectures and datasets, suggesting that further empirical validation is needed across diverse scenarios. An obvious limitation not discussed is the potential for bias in LLM evaluations, which could affect the quality of memory management decisions.
Why it matters
MemRefine’s approach to memory management has significant implications for the development of LLM agents, particularly in applications requiring long-term interaction and memory retention. By effectively managing memory within strict budgets, this framework can enhance the performance of LLMs in real-world scenarios, such as conversational agents and personal assistants. The work contributes to the growing body of research on efficient memory utilization in AI systems, as highlighted in related studies on memory management in neural networks. For further details, see the full paper available on arXiv.
By Turing Wire editorial staff · Jun 11, 2026 · Editorial standards →
Source: arXiv cs.CL