Notable agents robotics

Active Context Selection Improves Simple Regret in Contextual Bandits

Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash

Published
May 19, 2026 — 16:01 UTC

Problem
This paper addresses a gap in the contextual multi-armed bandit (CMAB) literature, specifically focusing on the challenge of optimizing simple regret in scenarios with a finite context space. The authors highlight the limitations of passive sampling methods, which do not leverage the structure of the context distribution. They propose a novel approach that allows for active context selection, which is particularly relevant in settings where the learner can control the sampling of subpopulations. This work is presented as a preprint and has not yet undergone peer review.

Method
The core technical contribution is the introduction of an active sampling strategy that optimally selects contexts based on a known context distribution vector ( p ). The authors derive tight regret rates for both passive and active sampling methods. In passive sampling, where contexts are revealed randomly, the regret is characterized as ( O(\sqrt{n/T \, \lVert p \rVert_{1/2}}) ). In contrast, the proposed active sampling strategy allocates resources according to ( q_j \propto p_j^{2/3} ), achieving a tighter regret rate of ( O(\sqrt{n/T} \, \lVert p \rVert_{2/3}) ). The improvement in regret can be as significant as ( Θ(k^{1/4}) ), where ( k ) is the number of contexts. The authors also extend their analysis to budgeted active sampling scenarios, providing a characterization of the regret rate under budget constraints. For cases where ( p ) is unknown, they introduce the Explore-Explore-Then-Commit (EETC) algorithm, which balances exploration of the context distribution with the timing of switching to active allocation, ensuring that it approaches the known ( p ) active rate for large horizons.

Results
The theoretical results demonstrate that the active sampling method significantly outperforms passive sampling in terms of regret. Specifically, the active method achieves a regret rate of ( O(\sqrt{n/T} \, \lVert p \rVert_{2/3}) ), compared to the passive method’s ( O(\sqrt{n/T \, \lVert p \rVert_{1/2}}) ). The experiments conducted on both synthetic and real-world datasets corroborate these findings, showing that the active context selection leads to improved performance metrics, although specific numerical results and comparisons to named baselines are not detailed in the abstract.

Limitations
The authors acknowledge that their results are contingent on the assumption of a known context distribution ( p ) for the optimal active sampling strategy. While they propose the EETC algorithm for unknown ( p ), the performance guarantees are only asymptotic and may not hold in finite settings. Additionally, the analysis does not consider the computational complexity of implementing the active sampling strategy, which could be a practical limitation in real-world applications. The paper does not address potential issues related to the scalability of the proposed methods as the number of contexts increases.

Why it matters
This work has significant implications for the design of algorithms in contextual bandit settings, particularly in applications where context distributions can be manipulated or are known a priori. The findings suggest that active context selection can lead to substantial improvements in decision-making processes, which is critical in fields such as personalized recommendation systems, adaptive clinical trials, and resource allocation problems. The introduction of the EETC algorithm also provides a practical approach for scenarios where context distributions are not readily available, paving the way for further research in adaptive sampling strategies.

Authors: Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash
Source: arXiv:2605.20040
URL: https://arxiv.org/abs/2605.20040v1

Turing Wire

By Turing Wire editorial staff · May 19, 2026 · Editorial standards →

Source: arXiv cs.LG