For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of ...
Many real-life problems involve dealing with multiple objectives. For example, in network routing th...
In order to perform a large variety of tasks and achieve human-level performance in complex real-wor...
Typically, a reinforcement learning agent interacts with the environment and learns how to select an...
For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochasti...
Many real-world problems involve the optimization of multiple, possibly conflicting ob-jectives. Mul...
A common approach to address multiobjective problems using reinforcement learning methods is to exte...
This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorith...
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policie...
Markov decision processes are sequential decision-making processes in which the learning agent sense...
In this talk we present PQ-learning, a new Reinforcement Learning (RL) algorithm that determines th...
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problem...
Eco-driving involves adaptively changing the speed of the vehicle to ensure minimal fuel consumption...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
The operation of large-scale water resources systems often involves several conflicting and noncomme...
This work describes MPQ-learning, an temporal-difference method that approximates the set of all non...
Many real-life problems involve dealing with multiple objectives. For example, in network routing th...
In order to perform a large variety of tasks and achieve human-level performance in complex real-wor...
Typically, a reinforcement learning agent interacts with the environment and learns how to select an...
For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochasti...
Many real-world problems involve the optimization of multiple, possibly conflicting ob-jectives. Mul...
A common approach to address multiobjective problems using reinforcement learning methods is to exte...
This paper describes a novel multi-objective reinforcement learning algorithm. The proposed algorith...
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policie...
Markov decision processes are sequential decision-making processes in which the learning agent sense...
In this talk we present PQ-learning, a new Reinforcement Learning (RL) algorithm that determines th...
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problem...
Eco-driving involves adaptively changing the speed of the vehicle to ensure minimal fuel consumption...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
The operation of large-scale water resources systems often involves several conflicting and noncomme...
This work describes MPQ-learning, an temporal-difference method that approximates the set of all non...
Many real-life problems involve dealing with multiple objectives. For example, in network routing th...
In order to perform a large variety of tasks and achieve human-level performance in complex real-wor...
Typically, a reinforcement learning agent interacts with the environment and learns how to select an...