We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
Many physical systems have underlying safety considerations that require that the policy employed en...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
In Reinforcement Learning (RL), state-of-the-art algorithms require a large num-ber of samples per s...
International audienceWe consider the problem of learning the optimal action-value function in disco...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
Many physical systems have underlying safety considerations that require that the policy employed en...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
In Reinforcement Learning (RL), state-of-the-art algorithms require a large num-ber of samples per s...
International audienceWe consider the problem of learning the optimal action-value function in disco...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
Many physical systems have underlying safety considerations that require that the policy employed en...