The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments sh...
International audienceThis paper aims at integrating machine learning techniques into meta-heuristic...
Finding global optimum solution in minimum time from large search space is challenging due to involv...
We investigate the possibility to apply a known machine learning algorithm of Q-learning in the doma...
Abstract—The balance between exploration and exploitation is one of the key problems of action selec...
Dimirovski, Georgi M. (Dogus Author) -- Conference full title: 17th World Congress, International Fe...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
Abstract — Q-learning is a technique used to compute an opti-mal policy for a controlled Markov chai...
Cooperative Q-learning approach allows multiple learners to learn independently then share their Q-v...
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decisi...
Reinforcement learning requires exploration, leading to repeated execution of sub-optimal actions. N...
For NP-hard optimisation problems no polynomial-time algorithms exist for finding a solution. Theref...
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observ...
Reinforcement learning is a hard problem and the majority of the existing algorithms su#er from poo...
Abstract. This article presents an algorithm that combines a FAST-based algorithm (Flexible Adaptabl...
Many computational problems can be solved by multiple algorithms, with different algorithms fastest ...
International audienceThis paper aims at integrating machine learning techniques into meta-heuristic...
Finding global optimum solution in minimum time from large search space is challenging due to involv...
We investigate the possibility to apply a known machine learning algorithm of Q-learning in the doma...
Abstract—The balance between exploration and exploitation is one of the key problems of action selec...
Dimirovski, Georgi M. (Dogus Author) -- Conference full title: 17th World Congress, International Fe...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
Abstract — Q-learning is a technique used to compute an opti-mal policy for a controlled Markov chai...
Cooperative Q-learning approach allows multiple learners to learn independently then share their Q-v...
Q-learning can be used to find an optimal action-selection policy for any given finite Markov Decisi...
Reinforcement learning requires exploration, leading to repeated execution of sub-optimal actions. N...
For NP-hard optimisation problems no polynomial-time algorithms exist for finding a solution. Theref...
We extend the Q-learning algorithm from the Markov Decision Process setting to problems where observ...
Reinforcement learning is a hard problem and the majority of the existing algorithms su#er from poo...
Abstract. This article presents an algorithm that combines a FAST-based algorithm (Flexible Adaptabl...
Many computational problems can be solved by multiple algorithms, with different algorithms fastest ...
International audienceThis paper aims at integrating machine learning techniques into meta-heuristic...
Finding global optimum solution in minimum time from large search space is challenging due to involv...
We investigate the possibility to apply a known machine learning algorithm of Q-learning in the doma...