In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to maximise click-through rates in web based applications, a task with significant commercial interest. In this article we consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. We extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
Published version of a chapter in the book: IFIP Advances in Information and Communication Technolog...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
Sequential decision-making is an iterative process between a learning agent and an environment. We s...
We present a method to solve the problem of choosing a set of adverts to display to each of a sequen...
International audienceOriginally motivated by default risk management applications, this paper inves...
International audienceThis paper is about index policies for minimizing (frequentist) regret in a st...
We address the problem of regret minimization in logistic contextual bandits, where a learner decide...
We address the problem of regret minimization in logistic contextual bandits, where a learner decide...
The Multi-armed Bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. I...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
Published version of a chapter in the book: IFIP Advances in Information and Communication Technolog...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
Sequential decision-making is an iterative process between a learning agent and an environment. We s...
We present a method to solve the problem of choosing a set of adverts to display to each of a sequen...
International audienceOriginally motivated by default risk management applications, this paper inves...
International audienceThis paper is about index policies for minimizing (frequentist) regret in a st...
We address the problem of regret minimization in logistic contextual bandits, where a learner decide...
We address the problem of regret minimization in logistic contextual bandits, where a learner decide...
The Multi-armed Bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. I...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
Published version of a chapter in the book: IFIP Advances in Information and Communication Technolog...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...