Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstractions. In this paper, we propose mechanisms to approximate causal states, which optimally compress the joint history of actions and observations in partially-observable Markov decision processes. Our proposed algorithm extracts causal state representations from RNNs that are trained to predict subsequent observations given the history. We demonstrate that these learned task-agnostic state abstractions can be used to efficiently learn policies for reinforcement learning problems with rich observation spaces. We evaluate agents using multiple partially observable navigation tasks with both discrete (GridWorld) and continuous (VizDoom, ALE) observ...
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurr...
peer reviewedReinforcement learning aims to learn optimal policies from interaction with environment...
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and us...
Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstracti...
In applying reinforcement learning to agents acting in the real world we are often faced with tasks ...
While deep reinforcement learning (DRL) has achieved great success in some large domains, most of th...
Institute of Perception, Action and BehaviourIn applying reinforcement learning to agents acting in ...
Faced with an ever-increasing complexity of their domains of application, artificial learning agents...
A very general framework for modeling uncertainty in learning environments is given by Partially Obs...
Typical “supervised ” and “unsupervised ” forms of machine learning are very specialized compared to...
In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from...
Learning agents that interact with complex environments often cannot predict the exact outcome of th...
AbstractTechniques based on reinforcement learning (RL) have been used to build systems that learn t...
Consider the finite state graph that results from a simple, discrete, dynamical system in which an a...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observatio...
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurr...
peer reviewedReinforcement learning aims to learn optimal policies from interaction with environment...
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and us...
Intelligent agents can cope with sensory-rich environments by learning task-agnostic state abstracti...
In applying reinforcement learning to agents acting in the real world we are often faced with tasks ...
While deep reinforcement learning (DRL) has achieved great success in some large domains, most of th...
Institute of Perception, Action and BehaviourIn applying reinforcement learning to agents acting in ...
Faced with an ever-increasing complexity of their domains of application, artificial learning agents...
A very general framework for modeling uncertainty in learning environments is given by Partially Obs...
Typical “supervised ” and “unsupervised ” forms of machine learning are very specialized compared to...
In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from...
Learning agents that interact with complex environments often cannot predict the exact outcome of th...
AbstractTechniques based on reinforcement learning (RL) have been used to build systems that learn t...
Consider the finite state graph that results from a simple, discrete, dynamical system in which an a...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observatio...
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurr...
peer reviewedReinforcement learning aims to learn optimal policies from interaction with environment...
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and us...