Abstract: We introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-time Optimal Control problems. This meta-algorithm maps the problem of finding a near-optimal closed-loop policy to the iden-tification of a small set of one-step system transitions, leading to high-quality policies when used as input of a batch-mode Reinforcement Learning (RL) algorithm. We detail a particular instance of this OSS meta-algorithm that uses tree-based Fitted Q-Iteration as a batch-mode RL algorithm and Cross Entropy search as a method for navigating efficiently in the space of sample sets. The results show that this particular instance of OSS algorithms is able to identify rapidly small sample sets leading to high-quality policies.
The date of receipt and acceptance will be inserted by the editor Abstract We consider the problem o...
Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors...
A linear programming formulation of the optimal stopping problem for Markov decision processes is ap...
peer reviewedWe introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-tim...
A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the envi...
Reinforcement learning aims to determine an optimal control policy from interaction with a system or...
Many computational problems can be solved by multiple algorithms, with different algorithms fastest ...
Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of hi...
Reinforcement learning is a hard problem and the majority of the existing algorithms su#er from poo...
Control engineering researchers are increasingly embracing data-driven techniques like reinforcement...
International audienceThis paper addresses the problem of batch Reinforcement Learning with Expert D...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
Markov decision processes (MDPs) are an established frame-work for solving sequential decision-makin...
We propose and analyze iterative algorithms that are computationally efficient, statistically sound ...
Abstract In this paper, we consider the batch mode reinforcement learning setting, where the central...
The date of receipt and acceptance will be inserted by the editor Abstract We consider the problem o...
Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors...
A linear programming formulation of the optimal stopping problem for Markov decision processes is ap...
peer reviewedWe introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-tim...
A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the envi...
Reinforcement learning aims to determine an optimal control policy from interaction with a system or...
Many computational problems can be solved by multiple algorithms, with different algorithms fastest ...
Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of hi...
Reinforcement learning is a hard problem and the majority of the existing algorithms su#er from poo...
Control engineering researchers are increasingly embracing data-driven techniques like reinforcement...
International audienceThis paper addresses the problem of batch Reinforcement Learning with Expert D...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
Markov decision processes (MDPs) are an established frame-work for solving sequential decision-makin...
We propose and analyze iterative algorithms that are computationally efficient, statistically sound ...
Abstract In this paper, we consider the batch mode reinforcement learning setting, where the central...
The date of receipt and acceptance will be inserted by the editor Abstract We consider the problem o...
Reinforcement learning (RL) is a general framework for learning and evaluating intelligent behaviors...
A linear programming formulation of the optimal stopping problem for Markov decision processes is ap...