This paper introduces an algorithm for direct search of control policies in continuous-state, discrete-action Markov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is ...
This paper presents the first ever approach for solving continuous-observation Decentralized Partial...
Continuous-time Markov decision processes are an important class of models in a wide range of applic...
peer reviewedWe introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-tim...
peer reviewedThis paper introduces a novel algorithm for approximate policy search in continuous-sta...
Markov decision process (MDP) models provide a unified framework for modeling and describing sequent...
peer reviewedDecentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multi...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
Abstract As an important approach to solving complex sequential decision problems, reinforcement lea...
This paper presents a novel algorithm for learning in a class of stochastic Markov decision process...
In this paper, we consider the control problem in a reinforcement learning setting with large state ...
Summarization: Reinforcement Learning methods for controlling stochastic processes typically assume ...
Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is l...
This dissertation investigates the problem of representation discovery in discrete Markov decision p...
Supervisor: Dr. Vicenç Gómez Cerdà; Co-Supervisor: Dr. Mario CeresaTreball fi de màster de: Master ...
The framework of dynamic programming (DP) and reinforcement learning (RL) can be used to express imp...
This paper presents the first ever approach for solving continuous-observation Decentralized Partial...
Continuous-time Markov decision processes are an important class of models in a wide range of applic...
peer reviewedWe introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-tim...
peer reviewedThis paper introduces a novel algorithm for approximate policy search in continuous-sta...
Markov decision process (MDP) models provide a unified framework for modeling and describing sequent...
peer reviewedDecentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multi...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
Abstract As an important approach to solving complex sequential decision problems, reinforcement lea...
This paper presents a novel algorithm for learning in a class of stochastic Markov decision process...
In this paper, we consider the control problem in a reinforcement learning setting with large state ...
Summarization: Reinforcement Learning methods for controlling stochastic processes typically assume ...
Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is l...
This dissertation investigates the problem of representation discovery in discrete Markov decision p...
Supervisor: Dr. Vicenç Gómez Cerdà; Co-Supervisor: Dr. Mario CeresaTreball fi de màster de: Master ...
The framework of dynamic programming (DP) and reinforcement learning (RL) can be used to express imp...
This paper presents the first ever approach for solving continuous-observation Decentralized Partial...
Continuous-time Markov decision processes are an important class of models in a wide range of applic...
peer reviewedWe introduce the Optimal Sample Selection (OSS) meta-algorithm for solving discrete-tim...