We describe how to use robust Markov decision processes for value function ap-proximation with state aggregation. The robustness serves to reduce the sensitiv-ity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the γ-discounted infinite horizon performance loss by a factor of 1/(1 − γ) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.
Policy robustness in Reinforcement Learning (RL) may not be desirable at any price; the alterations ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observation...
We consider large-scale Markov decision processes (MDPs) with parameter un-certainty, under the robu...
Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision...
Applying the reinforcement learning methodology to domains that involve risky decisions like medicin...
Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dyna...
We consider a Reinforcement Learning setup without any (esp. MDP) assumptions on the environment. St...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learnin...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
As planned, in the beginning of the project I’ve been concentrating on the topic of online aggregati...
Model-Based Reinforcement Learning (MBRL) algorithms solve sequential decision-making problems, usua...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
In these notes we will tackle the problem of finding optimal policies for Markov decision processes ...
Policy robustness in Reinforcement Learning (RL) may not be desirable at any price; the alterations ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observation...
We consider large-scale Markov decision processes (MDPs) with parameter un-certainty, under the robu...
Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision...
Applying the reinforcement learning methodology to domains that involve risky decisions like medicin...
Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dyna...
We consider a Reinforcement Learning setup without any (esp. MDP) assumptions on the environment. St...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
The goals of perturbation analysis (PA), Markov decision processes (MDPs), and reinforcement learnin...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
As planned, in the beginning of the project I’ve been concentrating on the topic of online aggregati...
Model-Based Reinforcement Learning (MBRL) algorithms solve sequential decision-making problems, usua...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
In these notes we will tackle the problem of finding optimal policies for Markov decision processes ...
Policy robustness in Reinforcement Learning (RL) may not be desirable at any price; the alterations ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observation...