International audienceWe propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(√ α * T) after T rounds, where α * is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of α *. For the special case of binary edge weights, our settin...
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds wit...
We study the problem of online multiclass classification in a setting where the learner’s feedback i...
We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning proble...
International audienceWe propose a new partial-observability model for online learning problems wher...
This study considers online learning with general directed feedback graphs. For this problem, we pre...
We consider a sequential learning problem with Gaussian payoffs and side in-formation: after selecti...
The framework of feedback graphs is a generalization of sequential decision-making with bandit or fu...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract. We present and study a partial-information model of online learning, where a decision make...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
AbstractWe study a partial-information online-learning problem where actions are restricted to noisy...
We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under di...
In online learning, a player chooses actions to play and receives reward and feedback from the envir...
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds wit...
We study the problem of online multiclass classification in a setting where the learner’s feedback i...
We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning proble...
International audienceWe propose a new partial-observability model for online learning problems wher...
This study considers online learning with general directed feedback graphs. For this problem, we pre...
We consider a sequential learning problem with Gaussian payoffs and side in-formation: after selecti...
The framework of feedback graphs is a generalization of sequential decision-making with bandit or fu...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract. We present and study a partial-information model of online learning, where a decision make...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
AbstractWe study a partial-information online-learning problem where actions are restricted to noisy...
We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under di...
In online learning, a player chooses actions to play and receives reward and feedback from the envir...
In a partial monitoring game, the learner repeatedly chooses an action, the environment responds wit...
We study the problem of online multiclass classification in a setting where the learner’s feedback i...
We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning proble...