International audienceWe study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 sqrt(L|X ||A|T log(|X ||A|/L)) in the ban...
We study the problem of online learning and online regret minimization when samples are drawn from a...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
International audienceWe study the problem of online learning in finite episodic Markov decision pro...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
<p>This dissertation describes sequential decision making problems in non-stationary environments. O...
International audienceWe consider the problem of online reinforcement learning when several state re...
International audienceWe consider the problem of online reinforcement learning when several state re...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
Abstract—We consider online learning in finite stochastic Markovian environments where in each time ...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
We study the problem of online learning and online regret minimization when samples are drawn from a...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
International audienceWe study the problem of online learning in finite episodic Markov decision pro...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
<p>This dissertation describes sequential decision making problems in non-stationary environments. O...
International audienceWe consider the problem of online reinforcement learning when several state re...
International audienceWe consider the problem of online reinforcement learning when several state re...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
Abstract—We consider online learning in finite stochastic Markovian environments where in each time ...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
We study the problem of online learning and online regret minimization when samples are drawn from a...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
We consider the problem of minimizing the long term average expected regret of an agent in an online...