International audienceWe propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(√ α * T) after T rounds, where α * is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of α *. For the special case of binary edge weights, our settin...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We consider an online learning problem with one-sided feedback, in which the learner is able to obse...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
International audienceWe propose a new partial-observability model for online learning problems wher...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
The framework of feedback graphs is a generalization of sequential decision-making with bandit or fu...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract We consider a sequential learning problem with Gaussian payoffs and side observations: afte...
This study considers online learning with general directed feedback graphs. For this problem, we pre...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We study the problem of online learning in adversarial bandit problems under a partial observability...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
We study the interplay between feedback and communication in a cooperative online learning setting w...
This document is the full version of an extended abstract published in the proceedings of COLT 2017....
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We consider an online learning problem with one-sided feedback, in which the learner is able to obse...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
International audienceWe propose a new partial-observability model for online learning problems wher...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
The framework of feedback graphs is a generalization of sequential decision-making with bandit or fu...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract We consider a sequential learning problem with Gaussian payoffs and side observations: afte...
This study considers online learning with general directed feedback graphs. For this problem, we pre...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We study the problem of online learning in adversarial bandit problems under a partial observability...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
We study the interplay between feedback and communication in a cooperative online learning setting w...
This document is the full version of an extended abstract published in the proceedings of COLT 2017....
We introduce and study a partial-information model of online learning, where a decision maker repeat...
We consider an online learning problem with one-sided feedback, in which the learner is able to obse...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...