The statistical framework of Generalized Linear Models (GLM) can be applied to sequential problems involving categorical or ordinal rewards associated, for instance, with clicks, likes or ratings. In the example of binary rewards, logistic regression is well-known to be preferable to the use of standard linear modeling. Previous works have shown how to deal with GLMs in contextual online learning with bandit feedback when the environment is assumed to be stationary. In this paper, we relax this latter assumption and propose two upper confidence bound based algorithms that make use of either a sliding window or a discounted maximum-likelihood estimator. We provide theoretical guarantees on the behavior of these algorithms for general context...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
International audienceIn this note we identify several mistakes appearing in the existing literature...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
International audienceWe consider a stochastic linear bandit model in which the available actions c...
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) frame...
International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds signifi...
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change ...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
International audienceIn this note we identify several mistakes appearing in the existing literature...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samp...
International audienceWe consider a stochastic linear bandit model in which the available actions c...
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) frame...
International audienceWe introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds signifi...
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change ...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
International audienceIn this note we identify several mistakes appearing in the existing literature...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...