The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm to mitigate this bias. However, this comes at the price of an underestimation of action values, in addition to increased memory requirements and a slower convergence. In this paper, we introduce a new way to address the maximization bias in the form of a "self-correcting algorithm" for approximating the maximum of an expected value. Our method balances the overestimation of the single estimator used in conventional Q-learning and the underestimation of the double estimator used in Double Q-...
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decisi...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Various pathologies can occur when independent learners are used in cooperative Multi-Agent Reinforc...
Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) pro...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep ne...
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decisi...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Various pathologies can occur when independent learners are used in cooperative Multi-Agent Reinforc...
Double Q-learning is a popular reinforcement learning algorithm in Markov decision process (MDP) pro...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep ne...
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decisi...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...