In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on ϵ-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GP-SARSA) estimate uncertainties and sample actions leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform thorough analysis of Bayes-By-Backpropa...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Uncertainty quantification has been extensively used as a means to achieve efficient directed explor...
Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-fr...
The optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of t...
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty ab...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to o...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
© 2018, the Authors. Reinforcement learning (RL) aims to resolve the sequential decision-making unde...
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-lear...
Deep, model based reinforcement learning has shown state of the art, human-exceeding performance in ...
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) ...
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue poli...
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games themselv...
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of s...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Uncertainty quantification has been extensively used as a means to achieve efficient directed explor...
Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-fr...
The optimization of dialogue policies using reinforcement learning (RL) is now an accepted part of t...
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty ab...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to o...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
© 2018, the Authors. Reinforcement learning (RL) aims to resolve the sequential decision-making unde...
We present a new algorithm that significantly improves the efficiency of exploration for deep Q-lear...
Deep, model based reinforcement learning has shown state of the art, human-exceeding performance in ...
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) ...
Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue poli...
Uncertainty is ubiquitous in games, both in the agents playing games and often in the games themselv...
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of s...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Although Reinforcement Learning (RL) has been one of the most successful approaches for learning in ...
Uncertainty quantification has been extensively used as a means to achieve efficient directed explor...