We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e.,~estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision ma...
We study the problem of model selection for contextual bandits, in which the algorithm must balance ...
Model selection in contextual bandits is an important complementary problem to regret minimization w...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision ma...
We study the problem of model selection for contextual bandits, in which the algorithm must balance ...
Model selection in contextual bandits is an important complementary problem to regret minimization w...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...