Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act near-optimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the associated model parameters have been learned with confidence, is random. Others, such as Model-Based Interval Estimation (MBIE) have weaker sample complexity bounds and require careful parameter tuning. This paper proposes a new PAC-MDP algorithm called V-MAX designed to address these problems. By restricting its optimism to future visits, V-MAX can expl...
Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a sm...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
Exploration strategy is an essential part of learn-ing agents in model-based Reinforcement Learn-ing...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods t...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
PAC-MDP algorithms are particularly efficient in terms of the num-ber of samples obtained from the e...
Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the cur...
Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the cur...
Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a sm...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
Exploration strategy is an essential part of learn-ing agents in model-based Reinforcement Learn-ing...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods t...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
PAC-MDP algorithms are particularly efficient in terms of the num-ber of samples obtained from the e...
Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the cur...
Formal exploration approaches in model-based reinforcement learning estimate the accuracy of the cur...
Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a sm...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...