Notable unlearning

Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning

Bocheng Ju, Jianhua Wang, Chengliang Liu, Xiaolin Chang

Published
Jun 9, 2026 — 15:26 UTC

Problem
This work addresses the challenge of unlearning in large language models (LLMs), specifically the need to suppress undesirable knowledge while maintaining benign capabilities. Existing methods primarily focus on suppressing undesired outputs or responses, but they often lack a structured approach to specify replacement behavior and do not adequately constrain the locality of updates. This paper introduces a preprint framework, Null-Space Constrained Response-Specified Unlearning (NSRU), which aims to fill this gap by providing a more controlled and effective unlearning mechanism.

Method
NSRU employs a projection-constrained low-rank adaptation strategy to facilitate controlled unlearning. The method utilizes an explicitly structured safe target response to define the desired behavior for each forget query. It estimates per-module retain subspaces from benign hidden representations and applies an orthogonal-projected low-rank parameterization to confine updates to the null space of these retain subspaces. The optimization objective integrates safe-target learning, undesired-response suppression, and retention preservation, ensuring that updates do not adversely affect the model’s ability to retain benign knowledge. The authors provide a local first-order analysis demonstrating that this projected update minimizes perturbations in the retain subspace while allowing for effective shaping of forget-query behavior.

Results
Experiments conducted on the TOFU benchmark reveal that NSRU significantly suppresses extractable forget-set knowledge while enhancing retain question-answering (QA) performance and overall model utility. Specifically, NSRU outperforms representative baselines in safe-target alignment, achieving a notable improvement in performance metrics. On the WMDP benchmark, NSRU maintains hazardous-domain accuracy close to random-choice levels while preserving broad and domain-adjacent MMLU utility. The results indicate that NSRU effectively balances the trade-offs between unlearning and retention, with ablation studies confirming the importance of each component in the framework.

Limitations
The authors acknowledge that the proposed method is still in the preprint stage and has not undergone peer review. They also note potential limitations related to the scalability of the approach and the need for further empirical validation across diverse datasets and tasks. Additionally, while the method shows promise in controlled unlearning, the long-term effects of such adaptations on model performance in dynamic environments remain to be explored.

Why it matters
The implications of this work are significant for the field of machine learning, particularly in applications where data privacy and model adaptability are critical. The NSRU framework provides a structured approach to unlearning that can enhance the safety and reliability of LLMs in sensitive contexts. This research contributes to the ongoing discourse on ethical AI and responsible model deployment, as it offers a mechanism to mitigate the risks associated with retaining harmful knowledge. The findings and methodologies presented in this paper are relevant for future research in model unlearning and can inform the development of more robust and ethically aligned AI systems, as published in arXiv.

Turing Wire

By Turing Wire editorial staff · Jun 9, 2026 · Editorial standards →

Source: arXiv cs.AI