EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
Yuyang Li, Yime He, Zeyu Zhang, Dong Gong
Problem
This paper addresses the limitations of existing retrieval methods in long-term conversational memory systems, particularly in handling temporal and multi-hop questions. Current single-pass retrieval techniques are inadequate for these tasks, and iterative methods that refine queries often lack a systematic approach to identify and address evidence gaps—specific pieces of information that are missing from the retrieval set. The authors propose EviMem, a novel framework that explicitly diagnoses these gaps, which is not covered in prior literature. This work is presented as a preprint and has not yet undergone peer review.
Method
EviMem integrates two key components: IRIS (Iterative Retrieval via Insufficiency Signals) and LaceMem (Layered Architecture for Conversational Evidence Memory). IRIS employs a closed-loop framework that evaluates the sufficiency of retrieved evidence, identifying what is missing and guiding targeted query refinement. This approach contrasts with traditional methods that do not focus on the specific deficiencies in the retrieval set. LaceMem provides a hierarchical memory architecture that allows for fine-grained diagnosis of evidence gaps, facilitating a more effective retrieval process. The authors do not disclose specific details regarding the training compute or dataset size used in their experiments.
Results
EviMem demonstrates significant improvements over the baseline method MIRIX on the LoCoMo benchmark. For temporal questions, Judge Accuracy increased from 73.3% to 81.6%, while for multi-hop questions, it rose from 65.9% to 85.2%. Notably, EviMem achieves these results with a latency reduction of 4.5 times compared to MIRIX, indicating a substantial efficiency gain alongside improved accuracy. These results suggest that EviMem effectively addresses the evidence gap problem in long-term conversational memory systems.
Limitations
The authors acknowledge that their approach may still struggle with highly complex queries that require extensive contextual understanding beyond the current memory architecture. Additionally, the reliance on a specific benchmark (LoCoMo) may limit the generalizability of the results to other datasets or real-world applications. The paper does not discuss potential scalability issues or the impact of varying the size of the memory hierarchy on performance.
Why it matters
The implications of EviMem are significant for the development of conversational agents and systems that require long-term memory capabilities. By providing a method to systematically identify and address evidence gaps, EviMem enhances the ability of conversational systems to maintain context over extended interactions, improving user experience and engagement. This work lays the groundwork for future research into more sophisticated memory architectures and retrieval strategies, potentially influencing the design of next-generation conversational AI systems.
Authors: Yuyang Li, Yime He, Zeyu Zhang, Dong Gong
Source: arXiv cs.CL
URL: https://arxiv.org/abs/2604.27695v1