In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be Beta-distributed. Every time a bandit is selected its Beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations which occur when a bandit has been played N times; the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that b...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe propose a Bayesian information-geometric approach to the exploration-exploi...
The two-armed bandit problem is a classical optimization problem where a player sequentially selects...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
[[abstract]]A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Be...
We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independen...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
We propose a theoretical and computational framework for approximating the optimal policy in multi-a...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernou...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’...
[[abstract]]A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Be...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe propose a Bayesian information-geometric approach to the exploration-exploi...
The two-armed bandit problem is a classical optimization problem where a player sequentially selects...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
[[abstract]]A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Be...
We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independen...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
We propose a theoretical and computational framework for approximating the optimal policy in multi-a...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernou...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’...
[[abstract]]A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Be...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe propose a Bayesian information-geometric approach to the exploration-exploi...
The two-armed bandit problem is a classical optimization problem where a player sequentially selects...