Notable efficiency inference

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li

Published: Jun 15, 2026 — 17:53 UTC

Problem
The paper addresses the challenge of post-hoc context erasing in KV caches for long-context language models (LLMs), where local edits can have global repercussions. Specifically, once a span is processed, its influence propagates into the cached states of subsequent tokens, complicating the erasure of stale or harmful information. The authors highlight that traditional methods require full recomputation of all tokens following the erased span, leading to inefficiencies that scale with suffix length rather than the length of the erased span. This work is a preprint and has not undergone peer review.

Method
KVEraser introduces a learned approach to KV-cache editing that allows for efficient localized context erasing. The method replaces only the KV states of the erased interval with learned steering states while keeping the remaining cache intact. The authors employ a two-stage training pipeline: the first stage involves generic span-neighbor pre-training, which teaches the model to suppress the influence of the erased span. The second stage is task-specific fine-tuning, which adapts the erasing capability to various downstream applications. The architecture details and specific loss functions are not disclosed, but the training process emphasizes transferability across tasks.

Results
KVEraser demonstrates performance that nearly matches full recomputation in post-erasure tasks across context lengths ranging from 1K to 32K tokens. The latency increase for KVEraser is reported at only 24%, in stark contrast to a 17.6x increase associated with full recomputation. Additionally, KVEraser shows superior generalization to unseen long-document question-answering tasks that involve harmful factual distractors, achieving the best performance among approximate baselines with a 3–4x speedup over full recomputation.

Limitations
The authors acknowledge that while KVEraser performs well in in-domain tasks, its effectiveness in highly diverse or out-of-distribution scenarios remains to be fully evaluated. They also do not discuss potential limitations related to the generalizability of the learned steering states across different model architectures or the impact of varying task complexities on performance. Furthermore, the reliance on a two-stage training process may introduce additional overhead in terms of training time and resource requirements.

Why it matters
KVEraser has significant implications for the efficiency of long-context LLMs, particularly in applications requiring dynamic updates to cached information. By enabling localized context erasing without the need for full recomputation, this method can enhance the responsiveness and adaptability of LLMs in real-time applications, such as interactive dialogue systems and information retrieval tasks. The findings contribute to the ongoing discourse on optimizing LLM performance and resource utilization, as discussed in related works on context management in LLMs, as published in arXiv.

By Callan Zhang · Jun 15, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG