We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-se...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Stochastic sequential decision-making problems are generally modeled and solved as Markov decision p...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are thos...
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential ...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
This paper considers sequential decision making problems under uncertainty, the tradeoff between the...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Stochastic sequential decision-making problems are generally modeled and solved as Markov decision p...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are thos...
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential ...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
This paper considers sequential decision making problems under uncertainty, the tradeoff between the...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...