Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the optimal action values by using a target that maximizes over uncertain action-value estimates. Although the overestimation bias of Q-learning is generally considered harmful, a recent study suggests that it could be either harmful or helpful depending on the reinforcement learning problem. In this paper, we propose a new Q-learning variant, called Variation-resistant Q-learning, to control and utilize estimation bias for better performance. Firstly, we present the tabular version of the algorithm and mathematically prove its convergence. Secondly, we combine the algorithm with function approximation. Finally, we present empirical results from...
The Reinforcement learning (RL) algorithms solve a wide range of problems we faced. The topic of RL ...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
In this work we propose an approach for generalization in continuous domain Reinforcement Learning t...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
How to get a good value estimation is one of the key problems in reinforcement learning (RL). Curren...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overe...
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to opti...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and...
The Reinforcement learning (RL) algorithms solve a wide range of problems we faced. The topic of RL ...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
In this work we propose an approach for generalization in continuous domain Reinforcement Learning t...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
How to get a good value estimation is one of the key problems in reinforcement learning (RL). Curren...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overe...
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to opti...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and...
The Reinforcement learning (RL) algorithms solve a wide range of problems we faced. The topic of RL ...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
In this work we propose an approach for generalization in continuous domain Reinforcement Learning t...