We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, the belief error control in POMDPs and upper-confidence-bound methods for online learning. We establish a regret bound of $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
International audienceWe propose a new reinforcement learning algorithm for partially observable Mar...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
Much of reinforcement learning theory is built on top of oracles that are computationally hard to im...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
International audienceWe consider the problem of online reinforcement learning when several state re...
International audienceWe consider the problem of online reinforcement learning when several state re...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challengin...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
International audienceWe propose a new reinforcement learning algorithm for partially observable Mar...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
We propose a new reinforcement learning algorithm for partially observable Markov decision processes...
Much of reinforcement learning theory is built on top of oracles that are computationally hard to im...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
International audienceWe consider the problem of online reinforcement learning when several state re...
International audienceWe consider the problem of online reinforcement learning when several state re...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challengin...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...