Difference-Aware Retrieval Policies for Imitation Learning
Quinn Pfeifer, Ethan Pronovost, Paarth Shah, Khimya Khetarpal, Siddhartha Srinivasa, Abhishek Gupta
- Published
- Jun 8, 2026 — 17:18 UTC
Problem
The paper addresses the limitations of parametric imitation learning, specifically behavior cloning, which often struggles with generalization to out-of-distribution states due to compounding errors during deployment. The authors highlight that existing methods do not effectively utilize training data during inference, leading to suboptimal performance in real-world applications. This work is presented as a preprint and has not undergone peer review.
Method
The core contribution is the Difference-Aware Retrieval Policies for Imitation Learning (DARP), which reformulates the imitation learning problem by focusing on local neighborhood structures instead of direct state-to-action mappings. DARP employs a semi-parametric retrieval-based approach, where the model predicts actions based on the $k$-nearest neighbors from expert demonstrations. The model incorporates the corresponding actions of these neighbors and the relative distance vectors between the neighbor states and the query states. This method does not necessitate additional data collection, online expert feedback, or task-specific knowledge, making it broadly applicable. The training process involves standard behavior cloning techniques, ensuring compatibility with existing frameworks.
Results
DARP demonstrates significant performance improvements over standard behavior cloning, achieving enhancements of 15-46% across various domains, including continuous control and robotic manipulation tasks. The evaluation includes diverse representations, such as high-dimensional visual features, indicating the robustness of the approach. Specific benchmarks and baseline comparisons are not detailed in the abstract, but the reported effect sizes suggest a substantial advancement in imitation learning capabilities.
Limitations
The authors acknowledge that while DARP improves generalization, it may still be sensitive to the quality and diversity of the expert demonstrations used for training. Additionally, the reliance on $k$-nearest neighbors could introduce challenges in high-dimensional spaces, where the curse of dimensionality may affect retrieval efficacy. The paper does not address potential computational overhead associated with maintaining and querying the neighborhood structure during inference.
Why it matters
The implications of DARP are significant for advancing imitation learning methodologies, particularly in scenarios where expert demonstrations are limited or costly to obtain. By effectively leveraging local neighborhood information, DARP can enhance the adaptability of learned policies in dynamic environments, paving the way for more robust robotic systems and autonomous agents. This work contributes to the ongoing discourse in imitation learning and retrieval-based methods, as discussed in related literature, and is available on arXiv.
By Turing Wire editorial staff · Jun 8, 2026 · Editorial standards →
Source: arXiv cs.AI