We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-se...
International audienceIn this work, we propose KeRNS: an algorithm for episodic reinforcement learni...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
International audienceThe main contribution of this paper consists in extending several non-st...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential ...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems u...
This paper considers sequential decision making problems under uncertainty, the tradeoff between the...
Stochastic sequential decision-making problems are generally modeled and solved as Markov decision p...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
In most Reinforcement Learning (RL) studies, the considered task is assumed to be stationary, i.e., ...
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertaint...
The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimize...
International audienceIn this work, we propose KeRNS: an algorithm for episodic reinforcement learni...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
International audienceThe main contribution of this paper consists in extending several non-st...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential ...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems u...
This paper considers sequential decision making problems under uncertainty, the tradeoff between the...
Stochastic sequential decision-making problems are generally modeled and solved as Markov decision p...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
In most Reinforcement Learning (RL) studies, the considered task is assumed to be stationary, i.e., ...
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertaint...
The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimize...
International audienceIn this work, we propose KeRNS: an algorithm for episodic reinforcement learni...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
International audienceThe main contribution of this paper consists in extending several non-st...