International audienceWe consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinU...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
This paper considers contextual bandits with a finite number of arms, where the contexts are indepen...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change ...
We study a variant of stochastic bandits where the feedback model is specified by a graph. In this s...
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key ide...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems i...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We propose a new online algorithm for minimizing the cumulative regret in stochastic linear bandits....
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
This paper considers contextual bandits with a finite number of arms, where the contexts are indepen...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change ...
We study a variant of stochastic bandits where the feedback model is specified by a graph. In this s...
We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key ide...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...