We study learning algorithms for the classical Markovian bandit problem with discount. We explain how to adapt PSRL [24] and UCRL2 [2] to exploit the problem structure. These variants are called MB-PSRL and MB-UCRL2. While the regret bound and runtime of vanilla implementations of PSRL and UCRL2 are exponential in the number of bandits, we show that the episodic regret of MB-PSRL and MB-UCRL2 is $\tilde{O}(S\sqrt{nK})$ where $K$ is the number of episodes, $n$ is the number of bandits and $S$ is the number of states of each bandit (the exact bound in S, n and K is given in the paper). Up to a factor $\sqrt S$, this matches the lower bound of $\Omega(\sqrt{SnK})$ that we also derive in the paper. MB-PSRL is also computationally efficient: its...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Following the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitr...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
This paper considers the use of a simple posterior sampling algorithm to balance between exploration...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Following the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitr...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogen...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...