Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O(N log(N/δ)/((1 − γ)3ε2 state-transition samples are required to find an ε-optimal estimation of the action-value function with the probability (w.p.) 1−δ. Further, we prove that, for small values of ε, an order of O N log(N/δ)/ (1 − γ)3ε2)) samples is required to find an ε-optimal policy w.p. 1 − ...
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular s...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of th...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
International audienceWe consider the problem of learning the optimal action-value function in disco...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Mark...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular s...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of th...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
International audienceWe consider the problem of learning the optimal action-value function in disco...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Mark...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular s...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of th...