Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planning or estimation problem as a subroutine. In this work we develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-sep...
We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible,...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need ...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
People are efficient when they make decisions under uncertainty, even when their decisions have long...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequen...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning contr...
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving...
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elega...
Colloque avec actes et comité de lecture. internationale.International audienceA new algorithm for s...
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but const...
We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible,...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need ...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
People are efficient when they make decisions under uncertainty, even when their decisions have long...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Partially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequen...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning contr...
In recent work, Bayesian methods for exploration in Markov decision processes (MDPs) and for solving...
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elega...
Colloque avec actes et comité de lecture. internationale.International audienceA new algorithm for s...
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but const...
We cast the Proactive Learning (PAL) problem—Active Learning (AL) with multiple reluctant, fallible,...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
Learning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need ...