In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation bias, which may lead to poor performance or unstable learning. In this paper, we present a novel analysis of this problem using various control tasks. For solving these tasks, Q-learning is combined with a multilayer perceptron (MLP), experience replay, and a target network. We focus our analysis on the effect of the learning rate when training the MLP. Furthermore, we examine if decaying the learning rate over time has advantages over static ones. Experiments have been performed using various maze-solving problems involving deterministic or stochastic transition functions and 2D or 3D grids and two Open-AI gym control problems. We conducted...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
If reinforcement learning (RL) techniques are to be used for "real world" dynamic system c...
In this paper, we place deep Q-learning into a control-oriented perspective and study its learning d...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
Abstract. In this paper, we address an under-represented class of learning algorithms in the study o...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to opti...
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overe...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Pac-Xon is an arcade video game in which the player tries to fill a level space by conquering blocks...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
If reinforcement learning (RL) techniques are to be used for "real world" dynamic system c...
In this paper, we place deep Q-learning into a control-oriented perspective and study its learning d...
In Reinforcement learning, Q-learning is the best-known algorithm but it suffers from overestimation...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
Q-learning is a reinforcement learning algorithm that has overestimation bias, because it learns the...
htmlabstractIn some stochastic environments the well-known reinforcement learning algorithm Q-learni...
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It...
Abstract. In this paper, we address an under-represented class of learning algorithms in the study o...
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal po...
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to opti...
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overe...
Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (R...
Q-learning is a very popular reinforcement learning algorithm be-ing proven to converge to optimal p...
Pac-Xon is an arcade video game in which the player tries to fill a level space by conquering blocks...
Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques....
If reinforcement learning (RL) techniques are to be used for "real world" dynamic system c...
In this paper, we place deep Q-learning into a control-oriented perspective and study its learning d...