Essential Subspace Merging for Multi-Task Learning
Longhua Li, Lei Qi, Xin Geng, Qi Tian
- Published
- Jun 17, 2026 — 15:07 UTC
Problem
This work addresses the challenge of inter-task interference in multi-task learning (MTL) when merging models fine-tuned from the same pre-trained checkpoint. The authors identify a gap in existing literature regarding the effective integration of task-specific updates without compromising the performance of individual tasks. The paper is a preprint and has not undergone peer review.
Method
The authors propose Essential Subspace Decomposition (ESD) to analyze task updates by examining the principal components of activation shifts. ESD identifies an “essential subspace,” which captures the most significant directions of task-relevant energy, while other directions contribute to interference. Building on ESD, the authors introduce Essential Subspace Merging (ESM), a static merging technique that orthogonalizes and fuses these essential components into a single compact model. ESM++ extends this approach to a dynamic merging method that decomposes task-specific residuals into low-rank experts, utilizing prototype-based routing to select the most relevant expert during inference. The methods are training-free, meaning they do not require additional training after the initial model fine-tuning.
Results
The authors conduct extensive experiments across various task sets and model scales, demonstrating that both ESM and ESM++ effectively preserve task knowledge while minimizing inter-task interference. Specific performance metrics include improvements in task accuracy and reduced parameter interference compared to baseline methods, although exact numerical results and baseline comparisons are not detailed in the abstract. The experiments indicate that ESM and ESM++ outperform traditional merging techniques, showcasing their robustness in maintaining task-specific performance.
Limitations
The authors acknowledge that their methods may not generalize well to all types of tasks, particularly those with highly divergent objectives. They also note that the static nature of ESM may limit adaptability in dynamic environments where task distributions change over time. Additionally, the paper does not address the computational overhead associated with the prototype-based routing in ESM++, which could impact real-time applications.
Why it matters
The proposed methods have significant implications for advancing multi-task learning frameworks, particularly in scenarios where model efficiency and performance are critical. By effectively merging task-specific knowledge while mitigating interference, ESM and ESM++ can enhance the deployment of models in resource-constrained environments. This work contributes to the ongoing discourse in MTL, providing a novel approach to model integration that could inspire further research and applications in areas such as natural language processing and computer vision, as published in arXiv.
By Callan Zhang · Jun 17, 2026 · Editorial standards →
Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.
Source: arXiv cs.AI