International audienceOptimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some m...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods t...
arXiv admin note: text overlap with arXiv:2205.07704We consider reinforcement learning in an environ...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the pr...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Exploration strategy is an essential part of learn-ing agents in model-based Reinforcement Learn-ing...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods t...
arXiv admin note: text overlap with arXiv:2205.07704We consider reinforcement learning in an environ...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the pr...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
Exploration strategy is an essential part of learn-ing agents in model-based Reinforcement Learn-ing...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an ...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods t...
arXiv admin note: text overlap with arXiv:2205.07704We consider reinforcement learning in an environ...