In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero, Θ̃( T), Θ(T 2/3), or Θ(T). We provide computationally efficient learning algorithms that achieve the minimax regret within logarithmic factor for...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We examine the problem of regret minimization when the learner is involved in a continuous game with...
This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides t...
Partial monitoring is an expressive framework for sequential decision-making with an abundance of ap...
Many situations involve repeatedly making decisions in an uncertain environment: for instance, decid...
Many situations involve repeatedly making decisions in an uncertain environment: for instance, decid...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider repeated games in which the player, instead of observing the action chosen by the oppone...
In online learning, a player chooses actions to play and receives reward and feedback from the envir...
Partial monitoring is a rich framework for sequential decision making under uncertainty that general...
We propose a novel online learning method for mini-mizing regret in large extensive-form games. The ...
We propose a novel online learning method for minimizing regret in large extensive-form games. The a...
International audienceWe examine the problem of regret minimization when the learner is involved in ...
International audienceWe examine the problem of regret minimization when the learner is involved in ...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We examine the problem of regret minimization when the learner is involved in a continuous game with...
This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides t...
Partial monitoring is an expressive framework for sequential decision-making with an abundance of ap...
Many situations involve repeatedly making decisions in an uncertain environment: for instance, decid...
Many situations involve repeatedly making decisions in an uncertain environment: for instance, decid...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider repeated games in which the player, instead of observing the action chosen by the oppone...
In online learning, a player chooses actions to play and receives reward and feedback from the envir...
Partial monitoring is a rich framework for sequential decision making under uncertainty that general...
We propose a novel online learning method for mini-mizing regret in large extensive-form games. The ...
We propose a novel online learning method for minimizing regret in large extensive-form games. The a...
International audienceWe examine the problem of regret minimization when the learner is involved in ...
International audienceWe examine the problem of regret minimization when the learner is involved in ...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We examine the problem of regret minimization when the learner is involved in a continuous game with...
This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides t...