Abstract. We consider the active learning problem of inferring the transition model of a Markov Decision Process by acting and observ-ing transitions. This is particularly useful when no reward function is a priori defined. Our proposal is to cast the active learning task as a util-ity maximization problem using Bayesian reinforcement learning with belief-dependent rewards. After presenting three possible performance criteria, we derive from them the belief-dependent rewards to be used in the decision-making process. As computing the optimal Bayesian value function is intractable for large horizons, we use a simple algorithm to ap-proximately solve this optimization problem. Despite the sub-optimality of this technique, we show experimental...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
International audienceWe consider the active learning problem of inferring the transition model of a...
People are efficient when they make decisions under uncertainty, even when their decisions have long...
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elega...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
University of Minnesota M.S. thesis. June 2012. Major: Computer science. Advisor: Prof. Paul Schrate...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observation...
A key challenge in many reinforcement learning problems is delayed rewards, which can significantly ...
We present a Bayesian reinforcement learning model with a working memory module which can solve some...
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving...
Abstract—This paper presents the Bayesian Optimistic Plan-ning (BOP) algorithm, a novel model-based ...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
International audienceWe consider the active learning problem of inferring the transition model of a...
People are efficient when they make decisions under uncertainty, even when their decisions have long...
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elega...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
University of Minnesota M.S. thesis. June 2012. Major: Computer science. Advisor: Prof. Paul Schrate...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observation...
A key challenge in many reinforcement learning problems is delayed rewards, which can significantly ...
We present a Bayesian reinforcement learning model with a working memory module which can solve some...
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving...
Abstract—This paper presents the Bayesian Optimistic Plan-ning (BOP) algorithm, a novel model-based ...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...