In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a terminal state is reached, and the learner incurs a positive reward only when the goal state is reached. The objective of the learner is to maximize its long-term reward (expected number of times the goal state is reached), without having any prior knowledge...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
Cataloged from PDF version of article.Thesis (M.S.): Bilkent University, Department of Electrical an...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms,...
à paraîtreInternational audienceIn this paper we consider a particular class of problems called mult...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
Cataloged from PDF version of article.Thesis (M.S.): Bilkent University, Department of Electrical an...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms,...
à paraîtreInternational audienceIn this paper we consider a particular class of problems called mult...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...