Combining Trained Models in Reinforcement Learning
Ujjwal Patil, Javad Ghofrani
- Published
- May 4, 2026 — 02:37 UTC
- Summary length
- 420 words
- Relevance score
- 70%
Problem
This paper addresses the fragmented literature on pretrained knowledge reuse in deep reinforcement learning (DRL), highlighting the high sample cost and limited transferability of DRL models across tasks. The authors conduct a systematic review of empirical studies to synthesize findings on various methods of knowledge reuse, including transfer learning, distillation, ensemble methods, and federated training. The review is motivated by the need for clearer comparisons across studies, as existing literature often varies in tasks, baselines, and computational budgets. This work is presented as a preprint and has not yet undergone peer review.
Method
The authors employ a PRISMA-guided systematic review methodology, starting with 589 records from IEEE Xplore, ACM Digital Library, and citation tracing. After screening, 570 unique records were assessed, leading to a final selection of 15 empirical studies for synthesis. The analysis focuses on three key factors: source-target similarity, diversity among reused models, and the fairness of comparisons against from-scratch baselines. The authors categorize the findings qualitatively, identifying patterns in the effectiveness of knowledge reuse strategies.
Results
The review reveals three significant patterns:
- Positive outcomes are predominantly observed when source and target tasks exhibit substantial structural similarities or when methods incorporate explicit gating or alignment mechanisms.
- Evidence supporting the efficacy of ensemble methods and federated aggregation is promising but limited, primarily confined to narrow experimental settings.
- Comparisons that match computational resources between methods are infrequent, undermining claims of efficiency improvements over stronger single-agent baselines. The paper does not provide quantitative results or effect sizes, focusing instead on qualitative insights from the reviewed studies.
Limitations
The authors acknowledge several limitations, including the narrow scope of the studies reviewed and the lack of compute-matched comparisons, which may skew the interpretation of efficiency gains. Additionally, the provisional independence spectrum proposed for future benchmarking lacks validation, suggesting that it should be treated as a hypothesis rather than a definitive metric. The review’s reliance on qualitative synthesis may also limit the generalizability of its findings.
Why it matters
This work is significant for researchers and practitioners in DRL as it consolidates existing knowledge on pretrained model reuse, providing a clearer framework for understanding the conditions under which knowledge transfer is effective. By identifying gaps in the literature and proposing a structured approach to future benchmarking, the paper lays the groundwork for more rigorous comparisons in DRL research. The insights gained from this review could inform the design of more efficient DRL algorithms and encourage the development of standardized evaluation protocols.
Authors: Ujjwal Patil, Javad Ghofrani
Source: arXiv:2605.02159
URL: https://arxiv.org/abs/2605.02159v1