We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence Widening (SWUCRL2-CW) algorithm, and establish its dynamic regret bound when the variation budgets are known. In addition, we propose the Bandit-over-Reinforcement Learning (BORL) algorithm to adaptively tune the SWUCRL2-CW algorithm to achieve the same dynamic regret bound, but in a parameter-free manne...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirem...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
Sequentially making-decision abounds in real-world problems ranging from robots needing to interact ...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirem...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
Sequentially making-decision abounds in real-world problems ranging from robots needing to interact ...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...