Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ( Ô SAT) regret on some MDP, where T is the elapsed time and S and A are the cardinalities of the state and action spaces. This implies T = (SA) time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, S and A can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a factored MDP, it is possible to achieve regret that scales polynomially in the number of parameters encoding the factored MDP, which may be exponentially smaller than S or A. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sa...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
International audienceWe consider a regret minimization task under the average-reward criterion in a...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
International audienceWe consider a regret minimization task under the average-reward criterion in a...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...