AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t, only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decision. It is known that an adversary controlling the costs of the decisions can force the player a regret growing as t12 in the time t. In this work, we propose the first algorithm for a countably infinite set of decisions, that achieves a regret upper bounded by O(t12+ε), i.e. arbitrarily close to optimal order. To this aim, we build on the “foll...
In many decision problems, there are two levels of choice: The first one is strategic and the second...
We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed ba...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
Multi-armed bandit problems are the most basic examples of sequential decision problems with an expl...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We consider stochastic multi-armed bandits where the expected reward is a unimodal func-tion over pa...
35 pages, 14 figuresInternational audienceIn many online decision processes, the optimizing agent is...
In many decision problems, there are two levels of choice: The first one is strategic and the second...
We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed ba...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
AbstractThe nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, a...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
Multi-armed bandit problems are the most basic examples of sequential decision problems with an expl...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We consider stochastic multi-armed bandits where the expected reward is a unimodal func-tion over pa...
35 pages, 14 figuresInternational audienceIn many online decision processes, the optimizing agent is...
In many decision problems, there are two levels of choice: The first one is strategic and the second...
We extend the notion of regret with a welfarist perspective. Focussing on the classic multi-armed ba...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...