International audienceWe consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \gamma\in[0,1) only O(N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) samples are required to find an \epsilon-optimal estimation of the action-value function with the probability 1-\delta. We also prove a matching lower bound of \Theta (N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of ...
In contrast to the advances in characterizing the sample complexity for solving Markov decision proc...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceWe consider the problem of learning the optimal action-value function in the d...
International audienceWe consider the problem of learning the optimal action-value function in disco...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular s...
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Mark...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
With the increasing need for handling large state and action spaces, general function approximation ...
Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has...
We present a new algorithm for general reinforcement learning where the true environment is known ...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
In contrast to the advances in characterizing the sample complexity for solving Markov decision proc...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceWe consider the problem of learning the optimal action-value function in the d...
International audienceWe consider the problem of learning the optimal action-value function in disco...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We consider the problem of model-free reinforcement learning in the Markovian decision processes (MD...
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular s...
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Mark...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
With the increasing need for handling large state and action spaces, general function approximation ...
Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has...
We present a new algorithm for general reinforcement learning where the true environment is known ...
We consider the problem of model-free reinforcement learning (RL) in the Markovian decision processe...
In contrast to the advances in characterizing the sample complexity for solving Markov decision proc...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...