peer reviewedWe cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging...
Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a d...
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDP...
A probabilistic reinforcement learning algorithm is presented for finding control policies in contin...
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabili...
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown...
Statistical learning with missing or hidden information is ubiquitous in many practical problems. Fo...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic varian...
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problem...
Planning plays an important role in the broad class of decision theory. Planning has drawn much atte...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
Abstract. We present a model-free reinforcement learning method for partially observable Markov deci...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a d...
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDP...
A probabilistic reinforcement learning algorithm is presented for finding control policies in contin...
We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabili...
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown...
Statistical learning with missing or hidden information is ubiquitous in many practical problems. Fo...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic varian...
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problem...
Planning plays an important role in the broad class of decision theory. Planning has drawn much atte...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
Abstract. We present a model-free reinforcement learning method for partially observable Markov deci...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
Using the expression for the unnormalized nonlinear filter for a hidden Markov model, we develop a d...
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDP...
A probabilistic reinforcement learning algorithm is presented for finding control policies in contin...