Several recent works have proposed instance-dependent upper bounds on the number of episodes needed to identify, with probability $1-\delta$, an $\varepsilon$-optimal policy in finite-horizon tabular Markov Decision Processes (MDPs). These upper bounds feature various complexity measures for the MDP, which are defined based on different notions of sub-optimality gaps. However, as of now, no lower bound has been established to assess the optimality of any of these complexity measures, except for the special case of MDPs with deterministic transitions. In this paper, we propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy in any tabular episodic MDP. Additionall...
In high stake applications, active experimentation may be considered too risky and thus data are oft...
International audienceWe consider the problem of learning the optimal action-value function in the d...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
International audienceWe consider the problem of learning the optimal action-value function in disco...
In contrast to the advances in characterizing the sample complexity for solving Markov decision proc...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We consider a general reinforcement learning problem and show that carefully combining the Bayesian...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We present a new algorithm for general reinforcement learning where the true environment is known ...
In high stake applications, active experimentation may be considered too risky and thus data are oft...
International audienceWe consider the problem of learning the optimal action-value function in the d...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...
Several recent works have proposed instance-dependent upper bounds on the number of episodes needed ...
International audienceIn probably approximately correct (PAC) reinforcement learning (RL), an agent ...
International audienceIn this paper, we propose new problem-independent lower bounds on the sample c...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
International audienceWe consider the problem of learning the optimal action-value function in disco...
In contrast to the advances in characterizing the sample complexity for solving Markov decision proc...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We consider a general reinforcement learning problem and show that carefully combining the Bayesian...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We present a new algorithm for general reinforcement learning where the true environment is known ...
In high stake applications, active experimentation may be considered too risky and thus data are oft...
International audienceWe consider the problem of learning the optimal action-value function in the d...
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balanc...