Decoupling Knowledge and Task Subspaces for Composable Parametric Retrieval Augmented Generation
Weihang Su, Hanwen Zhang, Qingyao Ai, Yiqun Liu
Problem
This paper addresses a gap in the literature regarding the compositional reliability of Parametric Retrieval-Augmented Generation (PRAG) systems. Specifically, it identifies the issue of entangled knowledge and task behaviors in existing PRAG implementations, where document adapters are trained with task-supervised objectives. This entanglement can lead to instability and reduced focus on document-specific knowledge when multiple adapters are merged at inference time. The authors propose a novel approach to mitigate this issue, which is particularly relevant given the increasing complexity of multi-document retrieval tasks. This work is presented as a preprint and has not yet undergone peer review.
Method
The core technical contribution is the introduction of Orthogonal Subspace Decomposition (OSD) for training document adapters in PRAG systems. The authors propose a two-step training process: first, a Task Low-Rank Adaptation (LoRA) is trained to capture reusable task behavior, and second, document-specific LoRAs are trained to encode knowledge in an orthogonal subspace. This orthogonalization ensures that the updates from task and document adapters do not interfere with each other, thereby enhancing the compositional robustness of the system. The training compute details are not disclosed, but the method is evaluated across various knowledge-intensive tasks and model scales.
Results
The authors report significant improvements in compositional robustness when using the OSD approach compared to traditional methods. Specifically, they demonstrate that the orthogonalization strategy leads to better performance in multi-document PRAG scenarios. While exact headline numbers are not provided in the abstract, the results indicate a marked enhancement in stability and focus on document knowledge when multiple adapters are merged. The paper compares the OSD method against baseline PRAG implementations, showing that the proposed method outperforms these baselines across multiple benchmarks, although specific metrics and effect sizes are not detailed in the summary.
Limitations
The authors acknowledge that their approach may require additional computational resources due to the two-step training process. They also note that while OSD improves compositional robustness, it may not fully eliminate the challenges associated with merging adapters, particularly in highly complex tasks. An additional limitation not explicitly mentioned is the potential for overfitting in the task-specific LoRA if not properly regularized, which could undermine the benefits of orthogonalization.
Why it matters
This work has significant implications for the design of future PRAG systems, particularly in scenarios requiring the integration of multiple document sources. By decoupling task and document knowledge, the proposed OSD method enhances the reliability of adapter composition, which is crucial for applications in knowledge-intensive domains such as question answering and information retrieval. The findings encourage further exploration of orthogonalization techniques in other areas of machine learning where compositionality is essential.
Authors: Weihang Su, Hanwen Zhang, Qingyao Ai, Yiqun Liu
Source: arXiv:2604.26768
URL: https://arxiv.org/abs/2604.26768v1