International audienceWe consider online learning problems under a a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems ...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract. We present and study a partial-information model of online learning, where a decision make...
International audienceWe investigate a nonstochastic bandit setting in which the loss of an action i...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe propose a new partial-observability model for online learning problems wher...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We study the problem of online learning in adversarial bandit problems under a partial observability...
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are of...
International audienceMost work on sequential learning assumes a fixed set of actions that are avail...
We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning proble...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract. We present and study a partial-information model of online learning, where a decision make...
International audienceWe investigate a nonstochastic bandit setting in which the loss of an action i...
International audienceWe consider online learning problems under a a partial observability model cap...
International audienceWe propose a new partial-observability model for online learning problems wher...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
We consider an adversarial online learning setting where a decision maker can choose an action in ev...
We study the problem of online learning in adversarial bandit problems under a partial observability...
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are of...
International audienceMost work on sequential learning assumes a fixed set of actions that are avail...
We study how to adapt to smoothly-varying (‘easy’) environments in well-known online learning proble...
The bandit classification problem considers learning the labels of a time-indexed data stream under ...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We introduce and study a partial-information model of online learning, where a decision maker repeat...
Abstract. We present and study a partial-information model of online learning, where a decision make...
International audienceWe investigate a nonstochastic bandit setting in which the loss of an action i...